计算两个函数的重叠面积

Z时代
2024-01-10
分类：问答

我需要计算两个功能重叠的区域。在这个特定的简化示例中，我使用正态分布，但是我还需要一个更通用的过程来适应其他功能。

这是我到目前为止的MWE：

import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
# Generate random data uniformly distributed.
a = np.random.normal(1., 0.1, 1000)
b = np.random.normal(1., 0.1, 1000)
# Obtain KDE estimates foe each set of data.
xmin, xmax = -1., 2.
x_pts = np.mgrid[xmin:xmax:1000j]
# Kernels.
ker_a = stats.gaussian_kde(a)
ker_b = stats.gaussian_kde(b)
# KDEs for plotting.
kde_a = np.reshape(ker_a(x_pts).T, x_pts.shape)
kde_b = np.reshape(ker_b(x_pts).T, x_pts.shape)
# Random sample from a KDE distribution.
sample = ker_a.resample(size=1000)
# Compute the points below which to integrate.
iso = ker_b(sample)
# Filter the sample.
insample = ker_a(sample) < iso
# As per Monte Carlo, the integral is equivalent to the
# probability of drawing a point that gets through the
# filter.
integral = insample.sum() / float(insample.shape[0])
print integral
plt.xlim(0.4,1.9)
plt.plot(x_pts, kde_a)
plt.plot(x_pts, kde_b)
plt.show()

我申请Monte Carlo获得积分的地方。

这种方法的问题在于，当我用ker_b(sample)（或ker_a(sample)）评估任一分布中的采样点时，我会直接

将值放置在KDE线上。因此，即使是明显重叠的分布，也应该返回非常接近于1的公共/重叠面积值，而是返回较小的值（两条曲线的总面积均为1，因为它们是概率密度估计值）。

如何修复此代码以提供预期的结果？

这就是我应用振亚的答案的方式

# Calculate overlap between the two KDEs.
def y_pts(pt):
    y_pt = min(ker_a(pt), ker_b(pt))
    return y_pt
# Store overlap value.
overlap = quad(y_pts, -1., 2.)