将pandas DataFrame列扩展为多行

如果我有DataFrame这样的话:

pd.DataFrame( {"name" : "John", 

"days" : [[1, 3, 5, 7]]

})

给出以下结构:

           days  name

0 [1, 3, 5, 7] John

如何将其扩展到以下内容?

   days  name

0 1 John

1 3 John

2 5 John

3 7 John

回答:

您可以df.itertuples用来遍历每一行,并使用列表推导将数据重塑为所需的形式:

import pandas as pd

df = pd.DataFrame( {"name" : ["John", "Eric"],

"days" : [[1, 3, 5, 7], [2,4]]})

result = pd.DataFrame([(d, tup.name) for tup in df.itertuples() for d in tup.days])

print(result)

产量

   0     1

0 1 John

1 3 John

2 5 John

3 7 John

4 2 Eric

5 4 Eric


ivakar的解决方案,using_repeat是最快的:

In [48]: %timeit using_repeat(df)

1000 loops, best of 3: 834 µs per loop

In [5]: %timeit using_itertuples(df)

100 loops, best of 3: 3.43 ms per loop

In [7]: %timeit using_apply(df)

1 loop, best of 3: 379 ms per loop

In [8]: %timeit using_append(df)

1 loop, best of 3: 3.59 s per loop


这是用于上述基准测试的设置:

import numpy as np

import pandas as pd

N = 10**3

df = pd.DataFrame( {"name" : np.random.choice(list('ABCD'), size=N),

"days" : [np.random.randint(10, size=np.random.randint(5))

for i in range(N)]})

def using_itertuples(df):

return pd.DataFrame([(d, tup.name) for tup in df.itertuples() for d in tup.days])

def using_repeat(df):

lens = [len(item) for item in df['days']]

return pd.DataFrame( {"name" : np.repeat(df['name'].values,lens),

"days" : np.concatenate(df['days'].values)})

def using_apply(df):

return (df.apply(lambda x: pd.Series(x.days), axis=1)

.stack()

.reset_index(level=1, drop=1)

.to_frame('day')

.join(df['name']))

def using_append(df):

df2 = pd.DataFrame(columns = df.columns)

for i,r in df.iterrows():

for e in r.days:

new_r = r.copy()

new_r.days = e

df2 = df2.append(new_r)

return df2

以上是 将pandas DataFrame列扩展为多行 的全部内容, 来源链接: utcz.com/qa/415008.html

回到顶部