用Python编写程序以计算分组数据协方差并计算给定数据框中两列之间的分组数据协方差
假设您有一个数据框,并且根据分组数据和相应列计算协方差的结果为,
Grouped data covariance is:mark1 mark2
subjects
maths mark1 25.0 12.500000
mark2 12.5 108.333333
science mark1 28.0 50.000000
mark2 50.0 233.333333
Grouped data covariance between two columns:
subjects
maths 12.5
science 50.0
dtype: float64
解决方案
为了解决这个问题,我们将遵循以下步骤-
定义一个数据框
在数据框主题列中应用groupby函数
df.groupby('subjects')
将协方差函数应用于分组数据并存储固有的group_data,
group_data = df.groupby('subjects').cov()
将lambda函数应用于主题列中具有groupby记录的mark1和mark2列。它的定义如下
df.groupby('subjects').apply(lambda x: x['mark1'].cov(x['mark2']
例子
让我们看下面的代码以获得更好的理解-
import pandas as pddf =
pd.DataFrame({'subjects':['maths','maths','maths','science','science','science'],
'mark1':[80,90,85,95,93,85],
'mark2':[85,90,70,75,95,65]})
print("DataFrame is:\n",df)
group_data = df.groupby('subjects').cov()
print("Grouped data covariance is:\n", group_data)
result = df.groupby('subjects').apply(lambda x: x['mark1'].cov(x['mark2']))
print("Grouped data covariance between two columns:\n",result)
输出
DataFrame is:subjects mark1 mark2
0 maths 80 85
1 maths 90 90
2 maths 85 70
3 science 95 75
4 science 93 95
5 science 85 65
Grouped data covariance is:
mark1 mark2
subjects
maths mark1 25.0 12.500000
mark2 12.5 108.333333
science mark1 28.0 50.000000
mark2 50.0 233.333333
Grouped data covariance between two columns:
subjects
maths 12.5
science 50.0
dtype: float64
以上是 用Python编写程序以计算分组数据协方差并计算给定数据框中两列之间的分组数据协方差 的全部内容, 来源链接: utcz.com/z/349913.html