Python:按多列分组的值线图
我有一个数据帧,它有2列:genre和release_year。每年都有多种流派。格式如下:Python:按多列分组的值线图
genre release_year Action 2015
Action 2015
Adventure 2015
Action 2015
Action 2015
我需要使用Pandas/Python绘制所有类型的变化。
df = pd.read('genres.csv') df.shape
(53975, 2)
df_new = df.groupby(['release_year', 'genre'])['genre'].count()
这会导致以下分组。
release_year genre 1960 Action 8
Adventure 5
Comedy 8
Crime 2
Drama 13
Family 3
Fantasy 2
Foreign 1
History 5
Horror 7
Music 1
Romance 6
Science Fiction 3
Thriller 6
War 2
Western 6
1961 Action 7
Adventure 6
Animation 1
Comedy 10
Crime 2
Drama 16
Family 5
Fantasy 2
Foreign 1
History 3
Horror 3
Music 2
Mystery 1
Romance 7
...
我需要为多年来流派特征的变化绘制线图。即我必须有一个循环,这可以帮助我绘制多年来的各种流派。例如,
df_action = df.query('genre == "Action"') result_plot = df_action.groupby(['release_year','genre'])['genre'].count()
result_plot.plot(figsize=(10,10));
显示类型“行动”的情节。同样,而不是分别绘制每个流派我需要有一个相同的循环。
我该怎么做?任何人都可以帮助我吗?
我试过以下,但它不起作用。
genres = ["Action", "Adventure", "Western", "Science Fiction", "Drama", "Family", "Comedy", "Crime", "Romance", "War", "Mystery",
"Thriller", "Fantasy", "History", "Animation", "Horror", "Music",
"Documentary", "TV Movie", "Foreign"]
for g in genres:
#df_new = df.query('genre == "g"')
result_plot = df.groupby(['release_year','genre'])['genre'].count()
result_plot.plot(figsize=(10,10));
回答:
怎么样开拆你的串联和一个命令绘制的一切:
In [36]: s Out[36]:
release_year genre
1960.0 Action 8
Adventure 5
Comedy 8
Crime 2
Drama 13
Family 3
Fantasy 2
Foreign 1
History 5
Horror 7
..
1961.0 Crime 2
Drama 16
Family 5
Fantasy 2
Foreign 1
History 3
Horror 3
Music 2
Mystery 1
Romance 7
Name: count, Length: 30, dtype: int64
In [37]: s.unstack()
Out[37]:
genre Action Adventure Animation Comedy Crime Drama Family Fantasy Foreign History Horror Music Mystery Romance \
release_year
1960.0 8.0 5.0 NaN 8.0 2.0 13.0 3.0 2.0 1.0 5.0 7.0 1.0 NaN 6.0
1961.0 7.0 6.0 1.0 10.0 2.0 16.0 5.0 2.0 1.0 3.0 3.0 2.0 1.0 7.0
genre Science Fiction Thriller War Western
release_year
1960.0 3.0 6.0 2.0 6.0
1961.0 NaN NaN NaN NaN
绘图:
s.unstack().plot()
回答:
df_new.unstack().T.plot(kind='bar')
我选择柱状图中,你可以改变你需要what ever
PS:你可以考虑crosstab
而不是groupby
pd.crosstab(df.genre,df.release_year).plot(kind='bar')
回答:
我推荐使用seaborn
这将有助于避免数据帧的处理绘图之前。您可以通过运行pip install seaborn
来安装它。它有标准的各种情节的简单API:
RELEASE_YEAR VS流派
import seaborn as sns sns.countplot(x='release_year', hue='genre', data=df)
流派VS RELEASE_YEAR
import seaborn as sns sns.countplot(x='genre', hue='release_year', data=df)
以上是 Python:按多列分组的值线图 的全部内容, 来源链接: utcz.com/qa/267317.html