NumPy:向量化为另一个数组中的每个元素在数组中找到最接近的值

回答:

known_array:numpy数组; 仅由标量值组成;shape: (m, 1)

test_array:numpy数组; 仅由标量值组成;shape: (n, 1)

回答:

indices:numpy数组; shape: (n, 1); 对于in

test_array中的每个值,查找in中最接近的值的索引known_array

residual:numpy数组; shape: (n, 1); 对于in

test_array中的每个值,查找与in中最接近的值的差known_array

回答:

In [17]: known_array = np.array([random.randint(-30,30) for i in range(5)])

In [18]: known_array

Out[18]: array([-24, -18, -13, -30, 29])

In [19]: test_array = np.array([random.randint(-10,10) for i in range(10)])

In [20]: test_array

Out[20]: array([-6, 4, -6, 4, 8, -4, 8, -6, 2, 8])

回答:

def find_nearest(known_array, value):

idx = (np.abs(known_array - value)).argmin()

diff = known_array[idx] - value

return [idx, -diff]

In [22]: indices = np.zeros(len(test_array))

In [23]: residual = np.zeros(len(test_array))

In [24]: for i in range(len(test_array)):

....: [indices[i], residual[i]] = find_nearest(known_array, test_array[i])

....:

In [25]: indices

Out[25]: array([ 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.])

In [26]: residual

Out[26]: array([ 7., 17., 7., 17., 21., 9., 21., 7., 15., 21.])

加快此任务的最佳方法是什么?Cython是一个选项,但是,我始终希望能够删除for循环并让代码保留为纯NumPy。


回答:

我做了一些小的基准测试来比较非矢量化和矢量化解决方案(可接受的答案)。

In [48]: [indices1, residual1] = find_nearest_vectorized(known_array, test_array)

In [53]: [indices2, residual2] = find_nearest_non_vectorized(known_array, test_array)

In [54]: indices1==indices2

Out[54]: array([ True, True, True, True, True, True, True, True, True, True], dtype=bool)

In [55]: residual1==residual2

Out[55]: array([ True, True, True, True, True, True, True, True, True, True], dtype=bool)

In [56]: %timeit [indices2, residual2] = find_nearest_non_vectorized(known_array, test_array)

10000 loops, best of 3: 173 µs per loop

In [57]: %timeit [indices1, residual1] = find_nearest_vectorized(known_array, test_array)

100000 loops, best of 3: 16.8 µs per loop

加速约 !

回答:

known_array 未排序。

我按照下面@cyborg的答案运行了基准测试。

:如果known_array已排序

known_array = np.arange(0,1000)

test_array = np.random.randint(0, 100, 10000)

print('Speedups:')

base_time = time_f('base')

for func_name in ['diffs', 'searchsorted1', 'searchsorted2']:

print func_name + ' is x%.1f faster than base.' % (base_time / time_f(func_name))

assert np.allclose(base(known_array, test_array), eval(func_name+'(known_array, test_array)'))


Speedups:

diffs is x0.4 faster than base.

searchsorted1 is x81.3 faster than base.

searchsorted2 is x107.6 faster than base.

首先,对于大型数组,diffs方法实际上要慢一些,它还会占用大量RAM,而当我在实际数据上运行它时,系统就会挂起。

:何时known_array未排序;代表实际情况

known_array = np.random.randint(0,100,100)

test_array = np.random.randint(0, 100, 100)


Speedups:

diffs is x8.9 faster than base.

AssertionError Traceback (most recent call last)

<ipython-input-26-3170078c217a> in <module>()

5 for func_name in ['diffs', 'searchsorted1', 'searchsorted2']:

6 print func_name + ' is x%.1f faster than base.' % (base_time / time_f(func_name))

----> 7 assert np.allclose(base(known_array, test_array), eval(func_name+'(known_array, test_array)'))

AssertionError:

searchsorted1 is x14.8 faster than base.

我还必须评论说,该方法还应该具有存储效率。否则我的8 GB RAM不足。在基本情况下,这很容易满足。

回答:

例如,您可以计算使用中的所有差异:

differences = (test_array.reshape(1,-1) - known_array.reshape(-1,1))

以及使用argmin和花式索引以及np.diagonal获得所需的索引和差异:

indices = np.abs(differences).argmin(axis=0)

residual = np.diagonal(differences[indices,])

因此对于

>>> known_array = np.array([-24, -18, -13, -30,  29])

>>> test_array = np.array([-6, 4, -6, 4, 8, -4, 8, -6, 2, 8])

一送一

>>> indices

array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

>>> residual

array([ 7, 17, 7, 17, 21, 9, 21, 7, 15, 21])

以上是 NumPy:向量化为另一个数组中的每个元素在数组中找到最接近的值 的全部内容, 来源链接: utcz.com/qa/412159.html

回到顶部