Python-pandas将列表的一列分为多列

我有DataFrame一列如下所示的熊猫:

In [207]:df2.teams

Out[207]:

0 [SF, NYG]

1 [SF, NYG]

2 [SF, NYG]

3 [SF, NYG]

4 [SF, NYG]

5 [SF, NYG]

6 [SF, NYG]

7 [SF, NYG]

我需要将列表的此列分为2列,team1team2使用pandas

回答:

您可以使用DataFrame与构造函数lists通过转换为创建numpy array通过values使用tolist

import pandas as pd

d1 = {'teams': [['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],

['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG']]}

df2 = pd.DataFrame(d1)

print (df2)

teams

0 [SF, NYG]

1 [SF, NYG]

2 [SF, NYG]

3 [SF, NYG]

4 [SF, NYG]

5 [SF, NYG]

6 [SF, NYG]

df2[['team1','team2']] = pd.DataFrame(df2.teams.values.tolist(), index= df2.index)

print (df2)

teams team1 team2

0 [SF, NYG] SF NYG

1 [SF, NYG] SF NYG

2 [SF, NYG] SF NYG

3 [SF, NYG] SF NYG

4 [SF, NYG] SF NYG

5 [SF, NYG] SF NYG

6 [SF, NYG] SF NYG

对于新的DataFrame:

df3 = pd.DataFrame(df2['teams'].values.tolist(), columns=['team1','team2'])

print (df3)

team1 team2

0 SF NYG

1 SF NYG

2 SF NYG

3 SF NYG

4 SF NYG

5 SF NYG

6 SF NYG

解决方案apply(pd.Series)非常慢:

#7k rows

df2 = pd.concat([df2]*1000).reset_index(drop=True)

In [89]: %timeit df2['teams'].apply(pd.Series)

1 loop, best of 3: 1.15 s per loop

In [90]: %timeit pd.DataFrame(df2['teams'].values.tolist(), columns=['team1','team2'])

1000 loops, best of 3: 820 µs per loop

以上是 Python-pandas将列表的一列分为多列 的全部内容, 来源链接: utcz.com/qa/435560.html

回到顶部