从分割索引获得的分组值

我需要找到两列(p_1_logreg,p_2_logreg)的最大值,其中比较应该只限制为14行。从分割索引获得的分组值

My csv file

我试图切开我的索引:

int1_str1_str2_int2_str3_int4 

最大应该在哪里INT1,STR1,STR2 INT2和STR3固定行之间只可以发现,和int4会发生变化(从索引0到索引13,依此类推)。

我试图一次修复每个元素并使用groupby,但是我无法遍历int4值。

以下是查找列p_1_label的最大值的代码,但结果不是我正在寻找的。

max_1_row=raw_prob.loc[raw_prob.groupby(raw_prob['id'].str.split('_').str[1])['p_1_'+label].idxmax()] 

max_1_row=max_1_row.loc[raw_prob.groupby(raw_prob['id'].str.split('_').str[3])['p_1_'+label].idxmax()]

max_1_row=max_1_row.loc[raw_prob.groupby(raw_prob['id'].str.split('_').str[5])['p_1_'+label].idxmax()]

任何想法?

回答:

我想你需要DataFrameGroupBy.idxmax替换最后_空字符串,然后通过loc选择:

df = pd.read_csv('myProb.csv', index_col=[0]) 

idx = df.drop('id', 1).groupby(df['id'].str.replace('_\d+$', '')).idxmax()

print (idx.head(15))

p_0_logreg p_1_logreg p_2_logreg

id

6_PanaCleanerJune_sub_12_ICA 2 9 6

6_PanaCleanerJune_sub_13_ICA 17 19 23

6_PanaCleanerJune_sub_14_ICA 34 37 33

6_PanaCleanerJune_sub_15_ICA 52 51 43

6_PanaCleanerJune_sub_17_ICA 66 67 69

6_PanaCleanerJune_sub_18_ICA 82 79 76

6_PanaCleanerJune_sub_19_ICA 89 87 90

6_PanaCleanerJune_sub_20_ICA 98 103 104

6_PanaCleanerJune_sub_21_ICA 114 117 112

6_PanaCleanerJune_sub_22_ICA 129 133 127

6_PanaCleanerJune_sub_23_ICA 145 146 143

6_PanaCleanerJune_sub_24_ICA 155 166 161

6_PanaCleanerJune_sub_25_ICA 176 173 174

6_PanaCleanerJune_sub_26_ICA 186 191 189

6_PanaCleanerJune_sub_27_ICA 202 203 209


df1 = df.loc[idx['p_1_logreg']] 

print (df1.head(15))

id p_0_logreg p_1_logreg p_2_logreg

9 6_PanaCleanerJune_sub_12_ICA_10 0.013452 0.985195 0.001353

19 6_PanaCleanerJune_sub_13_ICA_6 0.051184 0.948816 0.000000

37 6_PanaCleanerJune_sub_14_ICA_10 0.013758 0.979351 0.006890

51 6_PanaCleanerJune_sub_15_ICA_10 0.076056 0.923944 0.000000

67 6_PanaCleanerJune_sub_17_ICA_12 0.051060 0.947660 0.001280

79 6_PanaCleanerJune_sub_18_ICA_10 0.051184 0.948816 0.000000

87 6_PanaCleanerJune_sub_19_ICA_4 0.078162 0.917751 0.004087

103 6_PanaCleanerJune_sub_20_ICA_6 0.076400 0.921263 0.002337

117 6_PanaCleanerJune_sub_21_ICA_6 0.155002 0.791753 0.053245

133 6_PanaCleanerJune_sub_22_ICA_8 0.000000 0.998623 0.001377

146 6_PanaCleanerJune_sub_23_ICA_7 0.017549 0.973995 0.008457

166 6_PanaCleanerJune_sub_24_ICA_13 0.025215 0.974785 0.000000

173 6_PanaCleanerJune_sub_25_ICA_6 0.025656 0.960220 0.014124

191 6_PanaCleanerJune_sub_26_ICA_10 0.098872 0.895526 0.005602

203 6_PanaCleanerJune_sub_27_ICA_8 0.066493 0.932470 0.001037


df2 = df.loc[idx['p_2_logreg']] 

print (df2.head(15))

id p_0_logreg p_1_logreg p_2_logreg

6 6_PanaCleanerJune_sub_12_ICA_7 0.000000 0.000351 0.999649

23 6_PanaCleanerJune_sub_13_ICA_10 0.000000 0.000351 0.999649

33 6_PanaCleanerJune_sub_14_ICA_6 0.080748 0.000352 0.918900

43 6_PanaCleanerJune_sub_15_ICA_2 0.017643 0.000360 0.981996

69 6_PanaCleanerJune_sub_17_ICA_14 0.882449 0.000290 0.117261

76 6_PanaCleanerJune_sub_18_ICA_7 0.010929 0.000360 0.988711

90 6_PanaCleanerJune_sub_19_ICA_7 0.010929 0.000351 0.988720

104 6_PanaCleanerJune_sub_20_ICA_7 0.006714 0.000360 0.992925

112 6_PanaCleanerJune_sub_21_ICA_1 0.869393 0.000339 0.130269

127 6_PanaCleanerJune_sub_22_ICA_2 0.000000 0.000351 0.999649

143 6_PanaCleanerJune_sub_23_ICA_4 0.017218 0.000360 0.982421

161 6_PanaCleanerJune_sub_24_ICA_8 0.369685 0.000712 0.629603

174 6_PanaCleanerJune_sub_25_ICA_7 0.307056 0.000496 0.692448

189 6_PanaCleanerJune_sub_26_ICA_8 0.850195 0.000368 0.149437

209 6_PanaCleanerJune_sub_27_ICA_14 0.000000 0.000351 0.999649

详细信息:

print (df['id'].str.replace('_\d+$', '').head(15)) 

0 6_PanaCleanerJune_sub_12_ICA

1 6_PanaCleanerJune_sub_12_ICA

2 6_PanaCleanerJune_sub_12_ICA

3 6_PanaCleanerJune_sub_12_ICA

4 6_PanaCleanerJune_sub_12_ICA

5 6_PanaCleanerJune_sub_12_ICA

6 6_PanaCleanerJune_sub_12_ICA

7 6_PanaCleanerJune_sub_12_ICA

8 6_PanaCleanerJune_sub_12_ICA

9 6_PanaCleanerJune_sub_12_ICA

10 6_PanaCleanerJune_sub_12_ICA

11 6_PanaCleanerJune_sub_12_ICA

12 6_PanaCleanerJune_sub_12_ICA

13 6_PanaCleanerJune_sub_12_ICA

14 6_PanaCleanerJune_sub_13_ICA

Name: id, dtype: object

以上是 从分割索引获得的分组值 的全部内容, 来源链接: utcz.com/qa/262597.html

回到顶部