Pandas group by weekday(M/T/W/T/F/S/S)

我有一个包含YYYY-MM-DD('arrival_date')形式的时间序列(作为索引)的熊猫数据帧和I我想每个星期一到星期天都要分组,以便计算其他列的平均值,中位数,标准偏差等等。我最终应该只有七行,到目前为止我只知道如何按周分组,每周汇总一切。Pandas group by weekday(M/T/W/T/F/S/S)

# Reading the data 

df_data = pd.read_csv('data.csv', delimiter=',')

# Providing the correct format for the data

df_data = pd.to_datetime(df_data['arrival_date'], format='%Y%m%d')

# Converting the time series column to index

df_data.index = pd.to_datetime(df_data['arrival_date'], unit='d')

# Grouping by week (= ~52 rows per year)

week_df = df_data.resample('W').mean()

有一个简单的方法来实现我的目标,大熊猫?我正在考虑选择每个其他第7个元素,并对结果数组执行操作,但这似乎不必要的复杂。

数据帧的头部看起来像这样

 arrival_date price 1 price_2   price_3  price_4 

2 20170816 75.945298 1309.715056 71.510215 22.721958

3 20170817 68.803269 1498.639663 64.675232 22.759137

4 20170818 73.497144 1285.122022 65.620260 24.381532

5 20170819 78.556828 1377.318509 74.028607 26.882429

6 20170820 57.092189 1239.530625 51.942213 22.056378

7 20170821 76.278975 1493.385548 74.801641 27.471604

8 20170822 79.006604 1241.603185 75.360606 28.250994

9 20170823 76.097351 1243.586084 73.459963 24.500618

10 20170824 64.860259 1231.325899 63.205554 25.015120

11 20170825 70.407325 975.091107 64.180692 27.177654

12 20170826 87.742284 1351.306100 79.049023 27.860549

13 20170827 58.014005 1208.424489 51.963388 21.049374

14 20170828 65.774114 1289.341335 59.922912 24.481232

回答:

我相信你需要第一个参数parse_datesread_csv用于解析列于日期时间,然后通过weekday_name和汇总groupby

df_data = pd.read_csv('data.csv', parse_dates=['arrival_date']) 

week_df = df_data.groupby(df_data['arrival_date'].dt.weekday_name).mean()

print (week_df)

price_1 price_2 price_3 price_4

arrival_date

Friday 71.952235 1130.106565 64.900476 25.779593

Monday 71.026544 1391.363442 67.362277 25.976418

Saturday 83.149556 1364.312304 76.538815 27.371489

Sunday 57.553097 1223.977557 51.952801 21.552876

Thursday 66.831764 1364.982781 63.940393 23.887128

Tuesday 79.006604 1241.603185 75.360606 28.250994

Wednesday 76.021324 1276.650570 72.485089 23.611288

对于数字索引使用weekday

week_df = df_data.groupby(df_data['arrival_date'].dt.weekday).mean() 

print (week_df)

price_1 price_2 price_3 price_4

arrival_date

0 71.026544 1391.363442 67.362277 25.976418

1 79.006604 1241.603185 75.360606 28.250994

2 76.021324 1276.650570 72.485089 23.611288

3 66.831764 1364.982781 63.940393 23.887128

4 71.952235 1130.106565 64.900476 25.779593

5 83.149556 1364.312304 76.538815 27.371489

6 57.553097 1223.977557 51.952801 21.552876

编辑:

对于正确的顺序添加reindex

days = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday', 'Sunday'] 

week_df = df_data.groupby(df_data['arrival_date'].dt.weekday_name).mean().reindex(days)

print (week_df)

price_1 price_2 price_3 price_4

arrival_date

Monday 71.026544 1391.363442 67.362277 25.976418

Tuesday 79.006604 1241.603185 75.360606 28.250994

Wednesday 76.021324 1276.650570 72.485089 23.611288

Thursday 66.831764 1364.982781 63.940393 23.887128

Friday 71.952235 1130.106565 64.900476 25.779593

Saturday 83.149556 1364.312304 76.538815 27.371489

Sunday 57.553097 1223.977557 51.952801 21.552876

以上是 Pandas group by weekday(M/T/W/T/F/S/S) 的全部内容, 来源链接: utcz.com/qa/261787.html

回到顶部