将Pandas GroupBy对象转换为DataFrame(Converting a Pandas GroupBy output from Series to DataFrame)

我从这样的输入数据开始

df1 = pandas.DataFrame( { "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"] } )

打印时显示为:

City Name 0 Seattle Alice 1 Seattle Bob 2 Portland Mallory 3 Seattle Mallory 4 Seattle Bob 5 Portland Mallory

分组简单:

g1 = df1.groupby( [ "Name", "City"] ).count()

并且打印产生一个GroupBy对象:

City Name Name City Alice Seattle 1 1 Bob Seattle 2 2 Mallory Portland 2 2 Seattle 1 1

但是我最终想要的是包含GroupBy对象中所有行的另一个DataFrame对象。 换句话说,我想得到以下结果:

City Name Name City Alice Seattle 1 1 Bob Seattle 2 2 Mallory Portland 2 2 Mallory Seattle 1 1

我不太明白如何在熊猫文档中完成这一点。 欢迎任何提示。

I'm starting with input data like this

df1 = pandas.DataFrame( { "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"] } )

Which when printed appears as this:

City Name 0 Seattle Alice 1 Seattle Bob 2 Portland Mallory 3 Seattle Mallory 4 Seattle Bob 5 Portland Mallory

Grouping is simple enough:

g1 = df1.groupby( [ "Name", "City"] ).count()

and printing yields a GroupBy object:

City Name Name City Alice Seattle 1 1 Bob Seattle 2 2 Mallory Portland 2 2 Seattle 1 1

But what I want eventually is another DataFrame object that contains all the rows in the GroupBy object. In other words I want to get the following result:

City Name Name City Alice Seattle 1 1 Bob Seattle 2 2 Mallory Portland 2 2 Mallory Seattle 1 1

I can't quite see how to accomplish this in the pandas documentation. Any hints would be welcome.

最满意答案

g1这里一个DataFrame。 它有一个分层索引,但是:

In [19]: type(g1) Out[19]: pandas.core.frame.DataFrame In [20]: g1.index Out[20]: MultiIndex([('Alice', 'Seattle'), ('Bob', 'Seattle'), ('Mallory', 'Portland'), ('Mallory', 'Seattle')], dtype=object)

也许你想要这样的东西?

In [21]: g1.add_suffix('_Count').reset_index() Out[21]: Name City City_Count Name_Count 0 Alice Seattle 1 1 1 Bob Seattle 2 2 2 Mallory Portland 2 2 3 Mallory Seattle 1 1

或者像

In [36]: DataFrame({'count' : df1.groupby( [ "Name", "City"] ).size()}).reset_index() Out[36]: Name City count 0 Alice Seattle 1 1 Bob Seattle 2 2 Mallory Portland 2 3 Mallory Seattle 1

g1 here is a DataFrame. It has a hierarchical index, though:

In [19]: type(g1) Out[19]: pandas.core.frame.DataFrame In [20]: g1.index Out[20]: MultiIndex([('Alice', 'Seattle'), ('Bob', 'Seattle'), ('Mallory', 'Portland'), ('Mallory', 'Seattle')], dtype=object)

Perhaps you want something like this?

In [21]: g1.add_suffix('_Count').reset_index() Out[21]: Name City City_Count Name_Count 0 Alice Seattle 1 1 1 Bob Seattle 2 2 2 Mallory Portland 2 2 3 Mallory Seattle 1 1

Or something like:

In [36]: DataFrame({'count' : df1.groupby( [ "Name", "City"] ).size()}).reset_index() Out[36]: Name City count 0 Alice Seattle 1 1 Bob Seattle 2 2 Mallory Portland 2 3 Mallory Seattle 1将Pandas GroupBy对象转换为DataFrame(Converting a Pandas GroupBy output from Series to DataFrame)

我从这样的输入数据开始

df1 = pandas.DataFrame( { "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"] } )

打印时显示为:

City Name 0 Seattle Alice 1 Seattle Bob 2 Portland Mallory 3 Seattle Mallory 4 Seattle Bob 5 Portland Mallory

分组简单:

g1 = df1.groupby( [ "Name", "City"] ).count()

并且打印产生一个GroupBy对象:

City Name Name City Alice Seattle 1 1 Bob Seattle 2 2 Mallory Portland 2 2 Seattle 1 1

但是我最终想要的是包含GroupBy对象中所有行的另一个DataFrame对象。 换句话说,我想得到以下结果:

City Name Name City Alice Seattle 1 1 Bob Seattle 2 2 Mallory Portland 2 2 Mallory Seattle 1 1

我不太明白如何在熊猫文档中完成这一点。 欢迎任何提示。

I'm starting with input data like this

df1 = pandas.DataFrame( { "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"] } )

Which when printed appears as this:

City Name 0 Seattle Alice 1 Seattle Bob 2 Portland Mallory 3 Seattle Mallory 4 Seattle Bob 5 Portland Mallory

Grouping is simple enough:

g1 = df1.groupby( [ "Name", "City"] ).count()

and printing yields a GroupBy object:

City Name Name City Alice Seattle 1 1 Bob Seattle 2 2 Mallory Portland 2 2 Seattle 1 1

But what I want eventually is another DataFrame object that contains all the rows in the GroupBy object. In other words I want to get the following result:

City Name Name City Alice Seattle 1 1 Bob Seattle 2 2 Mallory Portland 2 2 Mallory Seattle 1 1

I can't quite see how to accomplish this in the pandas documentation. Any hints would be welcome.

最满意答案

g1这里一个DataFrame。 它有一个分层索引,但是:

In [19]: type(g1) Out[19]: pandas.core.frame.DataFrame In [20]: g1.index Out[20]: MultiIndex([('Alice', 'Seattle'), ('Bob', 'Seattle'), ('Mallory', 'Portland'), ('Mallory', 'Seattle')], dtype=object)

也许你想要这样的东西?

In [21]: g1.add_suffix('_Count').reset_index() Out[21]: Name City City_Count Name_Count 0 Alice Seattle 1 1 1 Bob Seattle 2 2 2 Mallory Portland 2 2 3 Mallory Seattle 1 1

或者像

In [36]: DataFrame({'count' : df1.groupby( [ "Name", "City"] ).size()}).reset_index() Out[36]: Name City count 0 Alice Seattle 1 1 Bob Seattle 2 2 Mallory Portland 2 3 Mallory Seattle 1

g1 here is a DataFrame. It has a hierarchical index, though:

In [19]: type(g1) Out[19]: pandas.core.frame.DataFrame In [20]: g1.index Out[20]: MultiIndex([('Alice', 'Seattle'), ('Bob', 'Seattle'), ('Mallory', 'Portland'), ('Mallory', 'Seattle')], dtype=object)

Perhaps you want something like this?

In [21]: g1.add_suffix('_Count').reset_index() Out[21]: Name City City_Count Name_Count 0 Alice Seattle 1 1 1 Bob Seattle 2 2 2 Mallory Portland 2 2 3 Mallory Seattle 1 1

Or something like:

In [36]: DataFrame({'count' : df1.groupby( [ "Name", "City"] ).size()}).reset_index() Out[36]: Name City count 0 Alice Seattle 1 1 Bob Seattle 2 2 Mallory Portland 2 3 Mallory Seattle 1