如何根据熊猫数据框中的多列获取百分比数?

我在数据框中有20列。 我列出其中4这里作为例子:如何根据熊猫数据框中的多列获取百分比数?

is_guarantee:0或1
hotel_star:0,1,2,3,4,5
ORDER_STATUS:40,60,80
旅程(标签): 0,1,2

is_guarantee hotel_star order_status journey 

0 0 5 60 0

1 1 5 60 0

2 1 5 60 0

3 0 5 60 1

4 0 4 40 0

5 0 4 40 1

6 0 4 40 1

7 0 3 60 0

8 0 2 60 0

9 1 5 60 0

10 0 2 60 0

11 0 2 60 0

Click to View Image

但该系统需要输入发生矩阵像以下格式函数:

Click to View Image

身体能帮助吗?

df1 = pd.DataFrame(index=range(0,20)) 

df1['is_guarantee'] = np.random.choice([0,1], df1.shape[0])

df1['hotel_star'] = np.random.choice([0,1,2,3,4,5], df1.shape[0])

df1['order_status'] = np.random.choice([40,60,80], df1.shape[0])

df1['journey '] = np.random.choice([0,1,2], df1.shape[0])

回答:

我想你需要:

  • 重塑通过meltgroupbysize得到计数,通过unstack
  • 重塑然后划分和每行和参加MultiIndexindex


df = (df.melt('journey') 

.astype(str)

.groupby(['variable', 'journey','value'])

.size()

.unstack(1, fill_value=0))

df = (df.div(df.sum(1), axis=0)

.mul(100)

.add_prefix('journey_')

.set_index(df.index.map(' = '.join))

.rename_axis(None, 1))

print (df)

journey_0 journey_1

hotel_star = 2 100.000000 0.000000

hotel_star = 3 100.000000 0.000000

hotel_star = 4 33.333333 66.666667

hotel_star = 5 80.000000 20.000000

is_guarantee = 0 66.666667 33.333333

is_guarantee = 1 100.000000 0.000000

order_status = 40 33.333333 66.666667

order_status = 60 88.888889 11.111111

以上是 如何根据熊猫数据框中的多列获取百分比数? 的全部内容, 来源链接: utcz.com/qa/262810.html

回到顶部