pandas按某两列分组,求均值,但不加这行的?

pandas按某两列分组,求均值,但不加这行的?
1.想得到一个新列 test['avg_score'] , 想要的结果就是图片的 score_avg_before 列.
score_avg_before = 取小于本行Day_Num的列 ,按 Season 和 T1_TeamID 分组,score列的均值 .
如果本行之前没有,填"NaN".
就是求 这个赛季本球队今天之前的平均进球,所以均值不算当前这行.
这种pandas能算吗?

ps:可以按像网页那样,不是全赛季,只计算最近n场/10场/20场 的均值吗?

----------------------------------自己写的,结果对,速度太慢,怎么改成快的---------------------------------------

不太会用pandas,只写了一行行处理.按我这个逻辑改成 不是一行行处理 快速的也行。

test['score_avg'] = "NaN"

for i in range(0,len(test)):

if test['DayNum'][i] > 1:

tmp_sc = test.loc[test.DayNum<test['DayNum'][i]].reset_index(drop=True).groupby(["Season", 'T1_TeamID'])['score'].agg([np.mean]).reset_index()

test['score_avg'][i] = tmp_sc.loc[(tmp_sc.T1_TeamID==test['T1_TeamID'][i])&(tmp_sc.Season==test['Season'][i])].reset_index(drop=True)['mean'][0]

else:

test['score_avg'][i] = "NaN"

pandas按某两列分组,求均值,但不加这行的?

--------------------------下面是尝试修改快的错误--------------------------------------------------
1.改成这样会错误,下面。

test['score_avg'] =  np.where(test['DayNum'] > 1,test.loc[test.DayNum<test['DayNum']].reset_index(drop=True).groupby(["Season", 'T1_TeamID'])['score'].agg([np.mean]).reset_index().loc[(test.T1_TeamID==test['T1_TeamID'])&(test.Season==test['Season'])].reset_index(drop=True)['mean'][0],"NaN")

KeyError: 0
IndexError: index out of bounds

2.改成这样会错误,下面。

test['score_avg'] =  np.where(test['DayNum'] > 1,test.loc[test.DayNum<test['DayNum']].reset_index(drop=True).groupby(["Season", 'T1_TeamID'])['score'].agg([np.mean]).reset_index().loc[(test.T1_TeamID==test['T1_TeamID'])&(test.Season==test['Season'])].reset_index(drop=True)['mean'],"NaN")

ValueError: operands could not be broadcast together with shapes (21,) (0,) ()

-------------------------DF 创建初始化:------------------------

import pandas as pd

from numpy import nan as NaN

import numpy as np

data = dict()

data['T1_TeamID'] = [1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3]

data['score'] = [50,60,70,60,70,80,70,80,90,60,70,80,70,80,90,80,90,100,90,100,110]

data['score_avg_before'] = [NaN,NaN,NaN,50,60,70,55,65,75,NaN,NaN,NaN,60,70,80,65,75,85,70,80,90]

data['Season'] = ['2018','2018','2018','2018','2018','2018','2018','2018','2018','2019','2019','2019','2019','2019','2019','2019','2019','2019','2019','2019','2019']

data['DayNum'] = [1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3,4,4,4]

#create dataframe

test = pd.DataFrame(data)


回答:

首先,按照Season和T1_TeamID进行分组,
创建一个空表用于存数据,
每个组里面,把score组成一个列表,根据求和第0个到第k-1个,并且除以k-1个,
返回赋值,
最后concat聚合一下

grouped = test.groupby(["Season", 'T1_TeamID'])

groupall = []

for i in grouped:

test111 = i[1]

score_sum = []

k = 0

for j in test111.index:

k += 1

score_sum.append(test111.loc[j,'score'])

if test111.loc[j,'DayNum'] > 1 :

test111.loc[j,'score_avg'] = sum(score_sum[0:(k-1)]) / (len(score_sum)-1)

groupall.append(test111)

groupall1 = pd.concat(groupall)

以上是 pandas按某两列分组,求均值,但不加这行的? 的全部内容, 来源链接: utcz.com/a/158532.html

回到顶部