pandas groupby+apply+lambda 怎么实现分组后再分组(再分组是自定义条件)???

pandas groupby+apply+lambda 怎么实现分组后再分组(再分组是自定义条件)???
pandas groupby+apply+lambda 怎么实现分组后再分组(再分组是自定义条件)???
模拟数据
a = pd.DataFrame([[2,3],[2,1],[2,1],[3,4],[3,1],[3,1],[3,1],[3,1],[4,2],[4,1],[4,1],[4,1]],columns=['id','count'])
a['date'] = [datetime.datetime.strptime(x,'%Y-%m-%d %H:%M:%S') for x in

          ['2016-12-28 15:17:00','2016-12-28 15:29:00','2017-01-05 09:32:00','2016-12-03 18:10:00','2016-12-10 11:31:00',

'2016-12-14 09:32:00','2016-12-18 09:31:00','2016-12-22 09:32:00','2016-11-28 15:31:00','2016-12-01 16:11:00',

'2016-12-10 09:31:00','2016-12-13 12:06:00']]

写循环方式实现
a.sort_values(by=['id','date'],ascending = [True,False],inplace=True)
a['id'] = a['id'].astype(str)
a['id_up'] = a['id'].shift(-1)
a['id_down'] = a['id'].shift(1)
a['date_up'] = a['date'].shift(-1)
a['date_diff'] = a.apply(lambda a: (a['date'] - a['date_up'])/timedelta(days=1) if a['id'] == a['id_up'] else 0, axis=1)
a = a.reset_index()
a = a.drop(['index','id_up','id_down','date_up'],axis=1)
a['new'] = ''
for i in range(a.shape[0]):

if i == 0:

a.loc[i,'new'] = 1

else:

if a.loc[i,'id'] != a.loc[i-1,'id']:

a.loc[i,'new'] = 1

else:

if a.loc[i-1,'date_diff'] <= 4:

a.loc[i,'new'] = a.loc[i-1,'new']

else:

a.loc[i,'new'] = a.loc[i-1,'new'] + 1

a['new'] = a['id'].astype(str) + '-' + a['new'].astype(str)
我的数据源很多,已经开了很多个进程了,在处理数据时只能考虑线程,协程了
但是想通过pandas的特性来实现,我在excel上面实现比python要快得多
现在350万的数据,我处理12个小时还没完成......

以上是 pandas groupby+apply+lambda 怎么实现分组后再分组(再分组是自定义条件)??? 的全部内容, 来源链接: utcz.com/a/159108.html

回到顶部