在Pandas DataFrame中找到所有最大值的索引

Z时代
2024-01-10
分类：问答

我需要找到在Pandas DataFrame中获得最大值（每行）的所有索引。例如，如果我有一个像这样的dataFrame：

cat1 cat2 cat3 0 0 2 2 1 3 0 1 2 1 1 0

那么我正在寻找的方法将产生如下结果：

[['cat2', 'cat3'],
 ['cat1'],
 ['cat1', 'cat2']]

这是一个列表列表，但是其他数据结构也可以。

我不能使用df.idxmax(axis=1)，因为它只会产生第一个最大值。

回答：

以下是采用不同数据结构的信息：

In [8]: df = pd.DataFrame({'cat1':[0,3,1], 'cat2':[2,0,1], 'cat3':[2,1,0]})
In [9]: df
Out[9]: 
   cat1  cat2  cat3
0     0     2     2
1     3     0     1
2     1     1     0
[3 rows x 3 columns]
In [10]: rowmax = df.max(axis=1)

最大值由True值指示：

In [82]: df.values == rowmax[:,None]
Out[82]: 
array([[False,  True,  True],
       [ True, False, False],
       [ True,  True, False]], dtype=bool)

np.where

返回上面的DataFrame为True的索引。

In [84]: np.where(df.values == rowmax[:,None])
Out[84]: (array([0, 0, 1, 2, 2]), array([1, 2, 0, 0, 1]))

第一个数组指示的索引值axis=0，第二个数组指示的索引值axis=1。每个数组中有5个值，因为有五个位置为True。

您可以itertools.groupby用来构建发布的列表列表，尽管鉴于上述数据结构，也许您不需要这样做：

In [46]: import itertools as IT
In [47]: import operator
In [48]: idx = np.where(df.values == rowmax[:,None])
In [49]: groups = IT.groupby(zip(*idx), key=operator.itemgetter(0))
In [50]: [[df.columns[j] for i, j in grp] for k, grp in groups]
Out[50]: [['cat1', 'cat1'], ['cat2'], ['cat3', 'cat3']]

以上是在Pandas DataFrame中找到所有最大值的索引的全部内容，来源链接： utcz.com/qa/422958.html