我有这样的数据
Group AT1G01040-TAIR-G LOC_Os03g02970 69% Group AT1G01050-TAIR-G LOC_Os10g26600 85% LOC_Os10g26633 35% Group AT1G01090-TAIR-G LOC_Os04g02900 74%如何创建如下所示的数据结构:
print Dumper \%big; $VAR = { "Group AT1G01040-TAIR-G" => ['LOC_Os03g02970 69%'], "Group AT1G01050-TAIR-G" => ['LOC_Os10g26600 85%','LOC_Os10g26633 35%'], "Group AT1G01090-TAIR-G" => ['LOC_Os04g02900 74%']};这是我的尝试,但失败了:
my %big; while ( <> ) { chomp; my $line = $_; my $head = ""; my @temp; if ( $line =~ /^Group/ ) { $head = $line; $head =~ s/[\r\s]+//g; @temp = (); } elsif ($line =~ /^\t/){ my $cont = $line; $cont =~ s/[\t\r]+//g; push @temp, $cont; push @{$big{$head}},@temp; }; }I have a data like this
Group AT1G01040-TAIR-G LOC_Os03g02970 69% Group AT1G01050-TAIR-G LOC_Os10g26600 85% LOC_Os10g26633 35% Group AT1G01090-TAIR-G LOC_Os04g02900 74%How can create the data structure that looks like this:
print Dumper \%big; $VAR = { "Group AT1G01040-TAIR-G" => ['LOC_Os03g02970 69%'], "Group AT1G01050-TAIR-G" => ['LOC_Os10g26600 85%','LOC_Os10g26633 35%'], "Group AT1G01090-TAIR-G" => ['LOC_Os04g02900 74%']};This is my attempt, but fail:
my %big; while ( <> ) { chomp; my $line = $_; my $head = ""; my @temp; if ( $line =~ /^Group/ ) { $head = $line; $head =~ s/[\r\s]+//g; @temp = (); } elsif ($line =~ /^\t/){ my $cont = $line; $cont =~ s/[\t\r]+//g; push @temp, $cont; push @{$big{$head}},@temp; }; }最满意答案
我是这样做的:
my %big; my $currentGroup; while (my $line = <> ) { chomp $line; if ( $line =~ /^Group/ ) { $big{$line} = $currentGroup = []; } elsif ($line =~ s/^\t+//) { push @$currentGroup, $line; } }您可能应该为此添加一些额外的错误检查,例如一个else子句来警告与正则表达式不匹配的行。 另外,在推送之前检查$currentGroup是否为undef (如果第一行以制表符而不是“Group”开头)。
原始代码的最大问题是你在循环中声明并初始化$head和@temp ,这意味着它们在每一行都被重置。 需要在行之间保持变量的变量必须在循环外声明,就像我使用$currentGroup 。
我不太确定你打算用s/[\r\s]+//g; 位。 \r包含在\s ,这意味着与s/\s+//g; (它会删除所有空格),但是您想要的结果哈希包含键中的空格。 如果要删除尾随空格,则需要包含一个锚: s/\s+\z// 。
Here's how I'd do it:
my %big; my $currentGroup; while (my $line = <> ) { chomp $line; if ( $line =~ /^Group/ ) { $big{$line} = $currentGroup = []; } elsif ($line =~ s/^\t+//) { push @$currentGroup, $line; } }You should probably add some additional error checking to this, e.g. an else clause to warn about lines that don't match either regex. Also, check to see if $currentGroup is undef before pushing (in case the first line begins with a tab instead of "Group").
The biggest problem with your original code is that you're declaring and initializing $head and @temp inside the loop, which means they got reset on every line. Variables that need to persist across lines have to be declared outside the loop, as I've done with $currentGroup.
I'm not quite sure what you're intending to accomplish with the s/[\r\s]+//g; bit. \r is included in \s, so that means the same as s/\s+//g; (which would strip all whitespace), but your desired result hash includes whitespace in your keys. If you want to strip trailing whitespace, you need to include an anchor: s/\s+\z//.
如何在Perl中创建数组的哈希(How to create hash of array in Perl)我有这样的数据
Group AT1G01040-TAIR-G LOC_Os03g02970 69% Group AT1G01050-TAIR-G LOC_Os10g26600 85% LOC_Os10g26633 35% Group AT1G01090-TAIR-G LOC_Os04g02900 74%如何创建如下所示的数据结构:
print Dumper \%big; $VAR = { "Group AT1G01040-TAIR-G" => ['LOC_Os03g02970 69%'], "Group AT1G01050-TAIR-G" => ['LOC_Os10g26600 85%','LOC_Os10g26633 35%'], "Group AT1G01090-TAIR-G" => ['LOC_Os04g02900 74%']};这是我的尝试,但失败了:
my %big; while ( <> ) { chomp; my $line = $_; my $head = ""; my @temp; if ( $line =~ /^Group/ ) { $head = $line; $head =~ s/[\r\s]+//g; @temp = (); } elsif ($line =~ /^\t/){ my $cont = $line; $cont =~ s/[\t\r]+//g; push @temp, $cont; push @{$big{$head}},@temp; }; }I have a data like this
Group AT1G01040-TAIR-G LOC_Os03g02970 69% Group AT1G01050-TAIR-G LOC_Os10g26600 85% LOC_Os10g26633 35% Group AT1G01090-TAIR-G LOC_Os04g02900 74%How can create the data structure that looks like this:
print Dumper \%big; $VAR = { "Group AT1G01040-TAIR-G" => ['LOC_Os03g02970 69%'], "Group AT1G01050-TAIR-G" => ['LOC_Os10g26600 85%','LOC_Os10g26633 35%'], "Group AT1G01090-TAIR-G" => ['LOC_Os04g02900 74%']};This is my attempt, but fail:
my %big; while ( <> ) { chomp; my $line = $_; my $head = ""; my @temp; if ( $line =~ /^Group/ ) { $head = $line; $head =~ s/[\r\s]+//g; @temp = (); } elsif ($line =~ /^\t/){ my $cont = $line; $cont =~ s/[\t\r]+//g; push @temp, $cont; push @{$big{$head}},@temp; }; }最满意答案
我是这样做的:
my %big; my $currentGroup; while (my $line = <> ) { chomp $line; if ( $line =~ /^Group/ ) { $big{$line} = $currentGroup = []; } elsif ($line =~ s/^\t+//) { push @$currentGroup, $line; } }您可能应该为此添加一些额外的错误检查,例如一个else子句来警告与正则表达式不匹配的行。 另外,在推送之前检查$currentGroup是否为undef (如果第一行以制表符而不是“Group”开头)。
原始代码的最大问题是你在循环中声明并初始化$head和@temp ,这意味着它们在每一行都被重置。 需要在行之间保持变量的变量必须在循环外声明,就像我使用$currentGroup 。
我不太确定你打算用s/[\r\s]+//g; 位。 \r包含在\s ,这意味着与s/\s+//g; (它会删除所有空格),但是您想要的结果哈希包含键中的空格。 如果要删除尾随空格,则需要包含一个锚: s/\s+\z// 。
Here's how I'd do it:
my %big; my $currentGroup; while (my $line = <> ) { chomp $line; if ( $line =~ /^Group/ ) { $big{$line} = $currentGroup = []; } elsif ($line =~ s/^\t+//) { push @$currentGroup, $line; } }You should probably add some additional error checking to this, e.g. an else clause to warn about lines that don't match either regex. Also, check to see if $currentGroup is undef before pushing (in case the first line begins with a tab instead of "Group").
The biggest problem with your original code is that you're declaring and initializing $head and @temp inside the loop, which means they got reset on every line. Variables that need to persist across lines have to be declared outside the loop, as I've done with $currentGroup.
I'm not quite sure what you're intending to accomplish with the s/[\r\s]+//g; bit. \r is included in \s, so that means the same as s/\s+//g; (which would strip all whitespace), but your desired result hash includes whitespace in your keys. If you want to strip trailing whitespace, you need to include an anchor: s/\s+\z//.
发布评论