Python:按多列分组的值线图

我有一个数据帧,它有2列:genre和release_year。每年都有多种流派。格式如下:Python:按多列分组的值线图

genre release_year 

Action 2015

Action 2015

Adventure 2015

Action 2015

Action 2015

我需要使用Pandas/Python绘制所有类型的变化。

df = pd.read('genres.csv') 

df.shape

(53975, 2)

df_new = df.groupby(['release_year', 'genre'])['genre'].count()

这会导致以下分组。

release_year genre   

1960 Action 8

Adventure 5

Comedy 8

Crime 2

Drama 13

Family 3

Fantasy 2

Foreign 1

History 5

Horror 7

Music 1

Romance 6

Science Fiction 3

Thriller 6

War 2

Western 6

1961 Action 7

Adventure 6

Animation 1

Comedy 10

Crime 2

Drama 16

Family 5

Fantasy 2

Foreign 1

History 3

Horror 3

Music 2

Mystery 1

Romance 7

...

我需要为多年来流派特征的变化绘制线图。即我必须有一个循环,这可以帮助我绘制多年来的各种流派。例如,

df_action = df.query('genre == "Action"') 

result_plot = df_action.groupby(['release_year','genre'])['genre'].count()

result_plot.plot(figsize=(10,10));

显示类型“行动”的情节。同样,而不是分别绘制每个流派我需要有一个相同的循环。

我该怎么做?任何人都可以帮助我吗?

我试过以下,但它不起作用。

genres = ["Action", "Adventure", "Western", "Science Fiction", "Drama", 

"Family", "Comedy", "Crime", "Romance", "War", "Mystery",

"Thriller", "Fantasy", "History", "Animation", "Horror", "Music",

"Documentary", "TV Movie", "Foreign"]

for g in genres:

#df_new = df.query('genre == "g"')

result_plot = df.groupby(['release_year','genre'])['genre'].count()

result_plot.plot(figsize=(10,10));

回答:

怎么样开拆你的串联和一个命令绘制的一切:

In [36]: s 

Out[36]:

release_year genre

1960.0 Action 8

Adventure 5

Comedy 8

Crime 2

Drama 13

Family 3

Fantasy 2

Foreign 1

History 5

Horror 7

..

1961.0 Crime 2

Drama 16

Family 5

Fantasy 2

Foreign 1

History 3

Horror 3

Music 2

Mystery 1

Romance 7

Name: count, Length: 30, dtype: int64

In [37]: s.unstack()

Out[37]:

genre Action Adventure Animation Comedy Crime Drama Family Fantasy Foreign History Horror Music Mystery Romance \

release_year

1960.0 8.0 5.0 NaN 8.0 2.0 13.0 3.0 2.0 1.0 5.0 7.0 1.0 NaN 6.0

1961.0 7.0 6.0 1.0 10.0 2.0 16.0 5.0 2.0 1.0 3.0 3.0 2.0 1.0 7.0

genre Science Fiction Thriller War Western

release_year

1960.0 3.0 6.0 2.0 6.0

1961.0 NaN NaN NaN NaN

绘图:

s.unstack().plot() 

回答:

df_new.unstack().T.plot(kind='bar') 

我选择柱状图中,你可以改变你需要what ever

PS:你可以考虑crosstab而不是groupby

pd.crosstab(df.genre,df.release_year).plot(kind='bar') 

回答:

我推荐使用seaborn这将有助于避免数据帧的处理绘图之前。您可以通过运行pip install seaborn来安装它。它有标准的各种情节的简单API:

RELEASE_YEAR VS流派

import seaborn as sns 

sns.countplot(x='release_year', hue='genre', data=df)

流派VS RELEASE_YEAR

import seaborn as sns 

sns.countplot(x='genre', hue='release_year', data=df)

以上是 Python:按多列分组的值线图 的全部内容, 来源链接: utcz.com/qa/267317.html

回到顶部