执行使用两个列作为参数GROUPBY功能无论给定以下数据帧中的列

Z时代
2024-01-10
分类：问答

的量级：执行使用两个列作为参数GROUPBY功能无论给定以下数据帧中的列

Node_1 Node_2 Time A B 6 A B 4 B A 2 B C 5

一个如何获得，使用GROUPBY或其它方法中，数据帧如下：

Node_1 Node_2 Mean_Time A B 4 B C 5

第一行的通过找到的所有路由的平均A-> B和B-> A而获得Mean_Time，即(6 + 4 + 2)/3 = 4

回答：

在应该克东西线香港专业教育学院，你所期望的结果......这让丑陋了很多比它：d

import pandas as pd 
data = {'Node_1': {0: 'A', 1: 'A', 2: 'B', 3: 'B'}, 
'Node_2': {0: 'B', 1: 'B', 2: 'A', 3: 'C'}, 
'Time': {0: 6, 1: 4, 2: 2, 3: 5}} 
df = pd.DataFrame(data) 
# Create new column to group by 
df["Node"] = df[["Node_1","Node_2"]].apply(lambda x: tuple(sorted(x)),axis=1) 
# Create Mean_time column 
df["Mean_time"] = df.groupby('Node').transform('mean') 
# Drop duplicate rows and drop Node and Time columns 
df = df.drop_duplicates("Node").drop(['Node','Time'],axis=1) 
print(df)

Node_1 Node_2 Mean_time 0 A B 4 3 B C 5

另一种方法是使用：

df = (df.groupby('Node', as_index=False) 
      .agg({'Node_1':lambda x: list(x)[0], 
        'Node_2':lambda x: list(x)[0], 
        'Time': np.mean}) 
      .drop('Node',axis=1))

回答：

你可以使用np.sort对Node_1和Node_2列的每一行进行排序：

nodes = df.filter(regex='Node') 
arr = np.sort(nodes.values, axis=1) 
df.loc[:, nodes.columns] = arr

导致df现在看起来像：

Node_1 Node_2 Time 0 A B 6 1 A B 4 2 A B 2 3 B C 5

随着Node列排序，你可以groupby/agg像往常一样：

result = df.groupby(cols).agg('mean').reset_index()

import numpy as np 
import pandas as pd 
data = {'Node_1': {0: 'A', 1: 'A', 2: 'B', 3: 'B'}, 
'Node_2': {0: 'B', 1: 'B', 2: 'A', 3: 'C'}, 
'Time': {0: 6, 1: 4, 2: 2, 3: 5}} 
df = pd.DataFrame(data) 
nodes = df.filter(regex='Node') 
arr = np.sort(nodes.values, axis=1) 
cols = nodes.columns.tolist() 
df.loc[:, nodes.columns] = arr 
result = df.groupby(cols).agg('mean').reset_index() 
print(result)

产量

Node_1 Node_2 Time 0 A B 4 1 B C 5

以上是执行使用两个列作为参数GROUPBY功能无论给定以下数据帧中的列的全部内容，来源链接： utcz.com/qa/260249.html