如何在Perl中创建数组的哈希(How to create hash of array in Perl)

我有这样的数据

Group AT1G01040-TAIR-G LOC_Os03g02970 69% Group AT1G01050-TAIR-G LOC_Os10g26600 85% LOC_Os10g26633 35% Group AT1G01090-TAIR-G LOC_Os04g02900 74%

如何创建如下所示的数据结构:

print Dumper \%big; $VAR = { "Group AT1G01040-TAIR-G" => ['LOC_Os03g02970 69%'], "Group AT1G01050-TAIR-G" => ['LOC_Os10g26600 85%','LOC_Os10g26633 35%'], "Group AT1G01090-TAIR-G" => ['LOC_Os04g02900 74%']};

这是我的尝试,但失败了:

my %big; while ( <> ) { chomp; my $line = $_; my $head = ""; my @temp; if ( $line =~ /^Group/ ) { $head = $line; $head =~ s/[\r\s]+//g; @temp = (); } elsif ($line =~ /^\t/){ my $cont = $line; $cont =~ s/[\t\r]+//g; push @temp, $cont; push @{$big{$head}},@temp; }; }

I have a data like this

Group AT1G01040-TAIR-G LOC_Os03g02970 69% Group AT1G01050-TAIR-G LOC_Os10g26600 85% LOC_Os10g26633 35% Group AT1G01090-TAIR-G LOC_Os04g02900 74%

How can create the data structure that looks like this:

print Dumper \%big; $VAR = { "Group AT1G01040-TAIR-G" => ['LOC_Os03g02970 69%'], "Group AT1G01050-TAIR-G" => ['LOC_Os10g26600 85%','LOC_Os10g26633 35%'], "Group AT1G01090-TAIR-G" => ['LOC_Os04g02900 74%']};

This is my attempt, but fail:

my %big; while ( <> ) { chomp; my $line = $_; my $head = ""; my @temp; if ( $line =~ /^Group/ ) { $head = $line; $head =~ s/[\r\s]+//g; @temp = (); } elsif ($line =~ /^\t/){ my $cont = $line; $cont =~ s/[\t\r]+//g; push @temp, $cont; push @{$big{$head}},@temp; }; }

最满意答案

我是这样做的:

my %big; my $currentGroup; while (my $line = <> ) { chomp $line; if ( $line =~ /^Group/ ) { $big{$line} = $currentGroup = []; } elsif ($line =~ s/^\t+//) { push @$currentGroup, $line; } }

您可能应该为此添加一些额外的错误检查,例如一个else子句来警告与正则表达式不匹配的行。 另外,在推送之前检查$currentGroup是否为undef (如果第一行以制表符而不是“Group”开头)。

原始代码的最大问题是你在循环中声明并初始化$head和@temp ,这意味着它们在每一行都被重置。 需要在行之间保持变量的变量必须在循环外声明,就像我使用$currentGroup 。

我不太确定你打算用s/[\r\s]+//g; 位。 \r包含在\s ,这意味着与s/\s+//g; (它会删除所有空格),但是您想要的结果哈希包含键中的空格。 如果要删除尾随空格,则需要包含一个锚: s/\s+\z// 。

Here's how I'd do it:

my %big; my $currentGroup; while (my $line = <> ) { chomp $line; if ( $line =~ /^Group/ ) { $big{$line} = $currentGroup = []; } elsif ($line =~ s/^\t+//) { push @$currentGroup, $line; } }

You should probably add some additional error checking to this, e.g. an else clause to warn about lines that don't match either regex. Also, check to see if $currentGroup is undef before pushing (in case the first line begins with a tab instead of "Group").

The biggest problem with your original code is that you're declaring and initializing $head and @temp inside the loop, which means they got reset on every line. Variables that need to persist across lines have to be declared outside the loop, as I've done with $currentGroup.

I'm not quite sure what you're intending to accomplish with the s/[\r\s]+//g; bit. \r is included in \s, so that means the same as s/\s+//g; (which would strip all whitespace), but your desired result hash includes whitespace in your keys. If you want to strip trailing whitespace, you need to include an anchor: s/\s+\z//.

如何在Perl中创建数组的哈希(How to create hash of array in Perl)

我有这样的数据

Group AT1G01040-TAIR-G LOC_Os03g02970 69% Group AT1G01050-TAIR-G LOC_Os10g26600 85% LOC_Os10g26633 35% Group AT1G01090-TAIR-G LOC_Os04g02900 74%

如何创建如下所示的数据结构:

print Dumper \%big; $VAR = { "Group AT1G01040-TAIR-G" => ['LOC_Os03g02970 69%'], "Group AT1G01050-TAIR-G" => ['LOC_Os10g26600 85%','LOC_Os10g26633 35%'], "Group AT1G01090-TAIR-G" => ['LOC_Os04g02900 74%']};

这是我的尝试,但失败了:

my %big; while ( <> ) { chomp; my $line = $_; my $head = ""; my @temp; if ( $line =~ /^Group/ ) { $head = $line; $head =~ s/[\r\s]+//g; @temp = (); } elsif ($line =~ /^\t/){ my $cont = $line; $cont =~ s/[\t\r]+//g; push @temp, $cont; push @{$big{$head}},@temp; }; }

I have a data like this

Group AT1G01040-TAIR-G LOC_Os03g02970 69% Group AT1G01050-TAIR-G LOC_Os10g26600 85% LOC_Os10g26633 35% Group AT1G01090-TAIR-G LOC_Os04g02900 74%

How can create the data structure that looks like this:

print Dumper \%big; $VAR = { "Group AT1G01040-TAIR-G" => ['LOC_Os03g02970 69%'], "Group AT1G01050-TAIR-G" => ['LOC_Os10g26600 85%','LOC_Os10g26633 35%'], "Group AT1G01090-TAIR-G" => ['LOC_Os04g02900 74%']};

This is my attempt, but fail:

my %big; while ( <> ) { chomp; my $line = $_; my $head = ""; my @temp; if ( $line =~ /^Group/ ) { $head = $line; $head =~ s/[\r\s]+//g; @temp = (); } elsif ($line =~ /^\t/){ my $cont = $line; $cont =~ s/[\t\r]+//g; push @temp, $cont; push @{$big{$head}},@temp; }; }

最满意答案

我是这样做的:

my %big; my $currentGroup; while (my $line = <> ) { chomp $line; if ( $line =~ /^Group/ ) { $big{$line} = $currentGroup = []; } elsif ($line =~ s/^\t+//) { push @$currentGroup, $line; } }

您可能应该为此添加一些额外的错误检查,例如一个else子句来警告与正则表达式不匹配的行。 另外,在推送之前检查$currentGroup是否为undef (如果第一行以制表符而不是“Group”开头)。

原始代码的最大问题是你在循环中声明并初始化$head和@temp ,这意味着它们在每一行都被重置。 需要在行之间保持变量的变量必须在循环外声明,就像我使用$currentGroup 。

我不太确定你打算用s/[\r\s]+//g; 位。 \r包含在\s ,这意味着与s/\s+//g; (它会删除所有空格),但是您想要的结果哈希包含键中的空格。 如果要删除尾随空格,则需要包含一个锚: s/\s+\z// 。

Here's how I'd do it:

my %big; my $currentGroup; while (my $line = <> ) { chomp $line; if ( $line =~ /^Group/ ) { $big{$line} = $currentGroup = []; } elsif ($line =~ s/^\t+//) { push @$currentGroup, $line; } }

You should probably add some additional error checking to this, e.g. an else clause to warn about lines that don't match either regex. Also, check to see if $currentGroup is undef before pushing (in case the first line begins with a tab instead of "Group").

The biggest problem with your original code is that you're declaring and initializing $head and @temp inside the loop, which means they got reset on every line. Variables that need to persist across lines have to be declared outside the loop, as I've done with $currentGroup.

I'm not quite sure what you're intending to accomplish with the s/[\r\s]+//g; bit. \r is included in \s, so that means the same as s/\s+//g; (which would strip all whitespace), but your desired result hash includes whitespace in your keys. If you want to strip trailing whitespace, you need to include an anchor: s/\s+\z//.