用python pandas装箱列

我有一个带有数值的数据框列:

df['percentage'].head()

46.5

44.2

100.0

42.12

我想查看该列作为箱数:

bins = [0, 1, 5, 10, 25, 50, 100]

我如何将结果作为垃圾箱value counts?

[0, 1] bin amount

[1, 5] etc

[5, 10] etc

......

回答:

你可以使用pandas.cut

bins = [0, 1, 5, 10, 25, 50, 100]

df['binned'] = pd.cut(df['percentage'], bins)

print (df)

percentage binned

0 46.50 (25, 50]

1 44.20 (25, 50]

2 100.00 (50, 100]

3 42.12 (25, 50]

bins = [0, 1, 5, 10, 25, 50, 100]

labels = [1,2,3,4,5,6]

df['binned'] = pd.cut(df['percentage'], bins=bins, labels=labels)

print (df)

percentage binned

0 46.50 5

1 44.20 5

2 100.00 6

3 42.12 5

或numpy.searchsorted:

bins = [0, 1, 5, 10, 25, 50, 100]

df['binned'] = np.searchsorted(bins, df['percentage'].values)

print (df)

percentage binned

0 46.50 5

1 44.20 5

2 100.00 6

3 42.12 5

…然后value_countsor groupby和合计size:

s = pd.cut(df['percentage'], bins=bins).value_counts()

print (s)

(25, 50] 3

(50, 100] 1

(10, 25] 0

(5, 10] 0

(1, 5] 0

(0, 1] 0

Name: percentage, dtype: int64

s = df.groupby(pd.cut(df['percentage'], bins=bins)).size()

print (s)

percentage

(0, 1] 0

(1, 5] 0

(5, 10] 0

(10, 25] 0

(25, 50] 3

(50, 100] 1

dtype: int64

默认cut返回categorical

Series像这样的方法Series.value_counts()将使用所有类别,即使数据中不存在某些类别,也可以使用categorical操作。

以上是 用python pandas装箱列 的全部内容, 来源链接: utcz.com/qa/426467.html

回到顶部