使用numpy为RNN准备数据的最快方法是什么?

我目前有一个(1631160,78) NP阵列作为我的神经网络输入。我想尝试使用需要3D结构作为输入数据的LSTM。我目前使用下面的代码来生成所需的3D结构,但它超级慢(ETA> 1day)。有没有更好的方式与numpy做到这一点?使用numpy为RNN准备数据的最快方法是什么?

我当前的代码生成数据:

def transform_for_rnn(input_x, input_y, window_size): 

output_x = None

start_t = time.time()

for i in range(len(input_x)):

if i > 100 and i % 100 == 0:

sys.stdout.write('\rTransform Data: %d/%d\tETA:%s'%(i, len(input_x), str(datetime.timedelta(seconds=(time.time()-start_t)/i * (len(input_x) - i)))))

sys.stdout.flush()

if output_x is None:

output_x = np.array([input_x[i:i+window_size, :]])

else:

tmp = np.array([input_x[i:i+window_size, :]])

output_x = np.concatenate((output_x, tmp))

print

output_y = input_y[window_size:]

assert len(output_x) == len(output_y)

return output_x, output_y

回答:

下面是使用NumPy strides向量化的output_x建立一个办法 -

nrows = input_x.shape[0] - window_size + 1 

p,q = input_x.shape

m,n = input_x.strides

strided = np.lib.stride_tricks.as_strided

out = strided(input_x,shape=(nrows,window_size,q),strides=(m,m,n))

采样运行 -

In [83]: input_x 

Out[83]:

array([[ 0.73089384, 0.98555845, 0.59818726],

[ 0.08763718, 0.30853945, 0.77390923],

[ 0.88835985, 0.90506367, 0.06204614],

[ 0.21791334, 0.77523643, 0.47313278],

[ 0.93324799, 0.61507976, 0.40587073],

[ 0.49462016, 0.00400835, 0.66401908]])

In [84]: window_size = 4

In [85]: out

Out[85]:

array([[[ 0.73089384, 0.98555845, 0.59818726],

[ 0.08763718, 0.30853945, 0.77390923],

[ 0.88835985, 0.90506367, 0.06204614],

[ 0.21791334, 0.77523643, 0.47313278]],

[[ 0.08763718, 0.30853945, 0.77390923],

[ 0.88835985, 0.90506367, 0.06204614],

[ 0.21791334, 0.77523643, 0.47313278],

[ 0.93324799, 0.61507976, 0.40587073]],

[[ 0.88835985, 0.90506367, 0.06204614],

[ 0.21791334, 0.77523643, 0.47313278],

[ 0.93324799, 0.61507976, 0.40587073],

[ 0.49462016, 0.00400835, 0.66401908]]])

这将创建一个查看输入数组,正如我们所记忆的那样e高效。在大多数情况下,这应该转化为对性能的好处,而且涉及到进一步的操作。让我们来验证它的一个观点确实 -

In [86]: np.may_share_memory(out,input_x) 

Out[86]: True # Doesn't guarantee, but is sufficient in most cases

另一个肯定拍的方式来验证是设置一些值output和检查输入 -

In [87]: out[0] = 0 

In [88]: input_x

Out[88]:

array([[ 0. , 0. , 0. ],

[ 0. , 0. , 0. ],

[ 0. , 0. , 0. ],

[ 0. , 0. , 0. ],

[ 0.93324799, 0.61507976, 0.40587073],

[ 0.49462016, 0.00400835, 0.66401908]])

以上是 使用numpy为RNN准备数据的最快方法是什么? 的全部内容, 来源链接: utcz.com/qa/261160.html

回到顶部