将pandas DataFrame列扩展为多行

Z时代
2024-01-10
分类：问答

如果我有DataFrame这样的话：

pd.DataFrame( {"name" : "John", 
               "days" : [[1, 3, 5, 7]]
              })

给出以下结构：

           days  name
0  [1, 3, 5, 7]  John

如何将其扩展到以下内容？

days name 0 1 John 1 3 John 2 5 John 3 7 John

回答：

您可以df.itertuples用来遍历每一行，并使用列表推导将数据重塑为所需的形式：

import pandas as pd
df = pd.DataFrame( {"name" : ["John", "Eric"], 
               "days" : [[1, 3, 5, 7], [2,4]]})
result = pd.DataFrame([(d, tup.name) for tup in df.itertuples() for d in tup.days])
print(result)

产量

0 1 0 1 John 1 3 John 2 5 John 3 7 John 4 2 Eric 5 4 Eric

ivakar的解决方案，using_repeat是最快的：

In [48]: %timeit using_repeat(df)
1000 loops, best of 3: 834 µs per loop
In [5]: %timeit using_itertuples(df)
100 loops, best of 3: 3.43 ms per loop
In [7]: %timeit using_apply(df)
1 loop, best of 3: 379 ms per loop
In [8]: %timeit using_append(df)
1 loop, best of 3: 3.59 s per loop

这是用于上述基准测试的设置：

import numpy as np
import pandas as pd
N = 10**3
df = pd.DataFrame( {"name" : np.random.choice(list('ABCD'), size=N), 
                    "days" : [np.random.randint(10, size=np.random.randint(5))
                              for i in range(N)]})
def using_itertuples(df):
    return  pd.DataFrame([(d, tup.name) for tup in df.itertuples() for d in tup.days])
def using_repeat(df):
    lens = [len(item) for item in df['days']]
    return pd.DataFrame( {"name" : np.repeat(df['name'].values,lens), 
                          "days" : np.concatenate(df['days'].values)})
def using_apply(df):
    return (df.apply(lambda x: pd.Series(x.days), axis=1)
            .stack()
            .reset_index(level=1, drop=1)
            .to_frame('day')
            .join(df['name']))
def using_append(df):
    df2 = pd.DataFrame(columns = df.columns)
    for i,r in df.iterrows():
        for e in r.days:
            new_r = r.copy()
            new_r.days = e
            df2 = df2.append(new_r)
    return df2

以上是将pandas DataFrame列扩展为多行的全部内容，来源链接： utcz.com/qa/415008.html

将pandas DataFrame列扩展为多行

回答：

其他人也看了：