通过索引将numpy数组中的值设置为NaN

我想在numpy数组中设置特定值NaN(以将它们从按行均值计算中排除)。

我试过了

import numpy

x = numpy.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]])

cutoff = [5, 7]

for i in range(len(x)):

x[i][0:cutoff[i]:1] = numpy.nan

看着x,我只会看到-9223372036854775808我的期望NaN

我想到了一个替代方案:

for i in range(len(x)):

for k in range(cutoff[i]):

x[i][k] = numpy.nan

没发生什么事。我究竟做错了什么?

回答:

[@unutbu的解决方案必须摆脱您得到的值错误。如果您希望vectorize获得性能,可以这样使用boolean

indexing-

import numpy as np

# Create mask of positions in x (with float datatype) where NaNs are to be put

mask = np.asarray(cutoff)[:,None] > np.arange(x.shape[1])

# Put NaNs into masked region of x for the desired ouput

x[mask] = np.nan

样品运行-

In [92]: x = np.random.randint(0,9,(4,7)).astype(float)

In [93]: x

Out[93]:

array([[ 2., 1., 5., 2., 5., 2., 1.],

[ 2., 5., 7., 1., 5., 4., 8.],

[ 1., 1., 7., 4., 8., 3., 1.],

[ 5., 8., 7., 5., 0., 2., 1.]])

In [94]: cutoff = [5,3,0,6]

In [95]: x[np.asarray(cutoff)[:,None] > np.arange(x.shape[1])] = np.nan

In [96]: x

Out[96]:

array([[ nan, nan, nan, nan, nan, 2., 1.],

[ nan, nan, nan, 1., 5., 4., 8.],

[ 1., 1., 7., 4., 8., 3., 1.],

[ nan, nan, nan, nan, nan, nan, 1.]])


如果要获取掩盖的平均值,则可以修改较早提出的矢量化方法,以避免NaNs完全处理,更重要的是保留x整数值。这是修改后的方法-

# Get array version of cutoff

cutoff_arr = np.asarray(cutoff)

# Mask of positions in x which are to be considered for row-wise mean calculations

mask1 = cutoff_arr[:,None] <= np.arange(x.shape[1])

# Mask x, calculate the corresponding sum and thus mean values for each row

masked_mean_vals = (mask1*x).sum(1)/(x.shape[1] - cutoff_arr)

这是这种解决方案的示例运行-

In [61]: x = np.random.randint(0,9,(4,7))

In [62]: x

Out[62]:

array([[5, 0, 1, 2, 4, 2, 0],

[3, 2, 0, 7, 5, 0, 2],

[7, 2, 2, 3, 3, 2, 3],

[4, 1, 2, 1, 4, 6, 8]])

In [63]: cutoff = [5,3,0,6]

In [64]: cutoff_arr = np.asarray(cutoff)

In [65]: mask1 = cutoff_arr[:,None] <= np.arange(x.shape[1])

In [66]: mask1

Out[66]:

array([[False, False, False, False, False, True, True],

[False, False, False, True, True, True, True],

[ True, True, True, True, True, True, True],

[False, False, False, False, False, False, True]], dtype=bool)

In [67]: masked_mean_vals = (mask1*x).sum(1)/(x.shape[1] - cutoff_arr)

In [68]: masked_mean_vals

Out[68]: array([ 1. , 3.5 , 3.14285714, 8. ])

以上是 通过索引将numpy数组中的值设置为NaN 的全部内容, 来源链接: utcz.com/qa/425423.html

回到顶部