大熊猫 - 在同一个数据帧

Z时代
2024-01-10
分类：问答

上的日期时间列的最后N值的列使用聚合函数我有一个包含体育博彩数据的数据帧：match_id，TEAM_ID，goals_scored和比赛开始的时间日期时间列。我想将列添加到这个数据帧，对于每行显示的各队打进前一个n个匹配的目标总和。大熊猫 - 在同一个数据帧

回答：

我编写了一些模拟数据，因为我喜欢足球，但像Jacob H建议最好总是提供一个样本数据框与问题。

import pandas as pd 
import numpy as np 
np.random.seed(2) 
d = {'match_id': np.arange(10) 
     ,'team_id': ['City','City','City','Utd','Utd','Utd','Albion','Albion','Albion','Albion'] 
     ,'goals_scored': np.random.randint(0,5,10) 
     ,'time_played': [0,1,2,0,1,2,0,1,2,3]} 
df = pd.DataFrame(data=d) 
#previous n matches 
n=2 
#some Saturday 3pm kickoffs. 
rng = pd.date_range('2017-12-02 15:00:00','2017-12-25 15:00:00',freq='W') 
# change the time_played integers to the datetimes 
df['time_played'] = df['time_played'].map(lambda x: rng[x]) 
#be sure the sort order is correct 
df = df.sort_values(['team_id','time_played']) 
# a rolling sum() and then shift(1) to align value with row as per question 
df['total_goals'] = df.groupby(['team_id'])['goals_scored'].apply(lambda x: x.rolling(n).sum()) 
df['total_goals'] = df.groupby(['team_id'])['total_goals'].shift(1)

主要生产：

goals_scored match_id team_id time_played total_goals->(in previous n) 6 2 6 Albion 2017-12-03 15:00:00 NaN 7 1 7 Albion 2017-12-10 15:00:00 NaN 8 3 8 Albion 2017-12-17 15:00:00 3.0 9 2 9 Albion 2017-12-24 15:00:00 4.0 0 0 0 City 2017-12-03 15:00:00 NaN 1 0 1 City 2017-12-10 15:00:00 NaN 2 3 2 City 2017-12-17 15:00:00 0.0 3 2 3 Utd 2017-12-03 15:00:00 NaN 4 3 4 Utd 2017-12-10 15:00:00 NaN 5 0 5 Utd 2017-12-17 15:00:00 5.0

回答：

有可能是一个更有效的方式与聚合函数要做到这一点，但这里的地方，每个条目，你筛选你的整个数据帧以隔离团队和日期范围，然后求和目标的解决方案。

df['goals_to_date'] = df.apply(lambda row: np.sum(df[(df['team_id'] == row['team_id'])\ 
    &(df['datetime'] < row['datetime'])]['goals_scored']), axis = 1)

以上是大熊猫 - 在同一个数据帧的全部内容，来源链接： utcz.com/qa/267147.html

大熊猫 - 在同一个数据帧

回答：

回答：

其他人也看了：