在Pandas中，groupby分组列消失后

Z时代
2024-01-10
分类：问答

我有以下名为ttm的数据框：

usersidid clienthostid eventSumTotal LoginDaysSum score 0 12 1 60 3 1728 1 11 1 240 3 1331 3 5 1 5 3 125 4 6 1 16 2 216 2 10 3 270 3 1000 5 8 3 18 2 512

当我做

ttm.groupby(['clienthostid'], as_index=False, sort=False)['LoginDaysSum'].count()

我得到了我所期望的（尽管我希望结果在一个名为“ ratio”的新标签下）：

clienthostid LoginDaysSum 0 1 4 1 3 2

但是当我这样做

ttm.groupby(['clienthostid'], as_index=False, sort=False)['LoginDaysSum'].apply(lambda x: x.iloc[0] / x.iloc[1])

我得到：

0 1.0 1 1.5

标签为什么去了？我仍然还需要分组后的’clienthostid’，并且我也需要apply的结果在标签下

有时当我进行groupby时，仍然出现其他一些列，为什么有时列会消失而有时仍然存在？我有没有做这些东西的标志？

在我给出的示例中，当我确实对标签’LoginDaysSum’上显示的结果进行计数时，为什么要为结果添加新标签呢？

谢谢，

回答：

退货DataFrame后groupby有两种解决方案：

参数as_index=False是什么在起作用尼斯count，sum，mean功能

reset_index用于从index，更通用的解决方案级别创建新列
df = ttm.groupby([‘clienthostid’], as_index=False, sort=False)[‘LoginDaysSum’].count()
print (df)
clienthostid LoginDaysSum
0 1 4
1 3 2
df = ttm.groupby([‘clienthostid’], sort=False)[‘LoginDaysSum’].count().reset_index()
print (df)
clienthostid LoginDaysSum
0 1 4
1 3 2

对于第二个需求，删除as_index=False并改为添加reset_index：

#output is `Series`
a = ttm.groupby(['clienthostid'], sort=False)['LoginDaysSum'] \
         .apply(lambda x: x.iloc[0] / x.iloc[1])
print (a)
clienthostid
1    1.0
3    1.5
Name: LoginDaysSum, dtype: float64
print (type(a))
<class 'pandas.core.series.Series'>
print (a.index)
Int64Index([1, 3], dtype='int64', name='clienthostid')
df1 = ttm.groupby(['clienthostid'], sort=False)['LoginDaysSum']
         .apply(lambda x: x.iloc[0] / x.iloc[1]).reset_index(name='ratio')
print (df1)
   clienthostid  ratio
0             1    1.0
1             3    1.5

为什么有些列不见了？

我认为可能会自动排除讨厌的列：

#convert column to str ttm.usersidid = ttm.usersidid.astype(str) + 'aa' print (ttm) usersidid clienthostid eventSumTotal LoginDaysSum score 0 12aa 1 60 3 1728 1 11aa 1 240 3 1331 3 5aa 1 5 3 125 4 6aa 1 16 2 216 2 10aa 3 270 3 1000 5 8aa 3 18 2 512 #removed str column userid a = ttm.groupby(['clienthostid'], sort=False).sum() print (a) eventSumTotal LoginDaysSum score clienthostid 1 321 11 3400 3 288 5 1512

以上是在Pandas中，groupby分组列消失后的全部内容，来源链接： utcz.com/qa/410265.html

在Pandas中，groupby分组列消失后

回答：

其他人也看了：