如何在keras中添加注意力机制?

我目前正在使用从github

上的一次讨论中获得的这段代码,这是注意机制的代码:

_input = Input(shape=[max_length], dtype='int32')

# get the embedding layer

embedded = Embedding(

input_dim=vocab_size,

output_dim=embedding_size,

input_length=max_length,

trainable=False,

mask_zero=False

)(_input)

activations = LSTM(units, return_sequences=True)(embedded)

# compute importance for each step

attention = Dense(1, activation='tanh')(activations)

attention = Flatten()(attention)

attention = Activation('softmax')(attention)

attention = RepeatVector(units)(attention)

attention = Permute([2, 1])(attention)

sent_representation = merge([activations, attention], mode='mul')

sent_representation = Lambda(lambda xin: K.sum(xin, axis=-2), output_shape=(units,))(sent_representation)

probabilities = Dense(3, activation='softmax')(sent_representation)

这是正确的方法吗?我有点期待时间分布层的存在,因为关注机制分布在RNN的每个时间步中。我需要有人确认此实现(代码)是注意力机制的正确实现。谢谢。

回答:

如果您想在时间维度上关注,那么这段代码对我来说似乎是正确的:

activations = LSTM(units, return_sequences=True)(embedded)

# compute importance for each step

attention = Dense(1, activation='tanh')(activations)

attention = Flatten()(attention)

attention = Activation('softmax')(attention)

attention = RepeatVector(units)(attention)

attention = Permute([2, 1])(attention)

sent_representation = merge([activations, attention], mode='mul')

您已经计算出shape的注意力向量(batch_size, max_length)

attention = Activation('softmax')(attention)

我以前从未看过这段代码,所以我不能说这段代码是否正确:

K.sum(xin, axis=-2)

以上是 如何在keras中添加注意力机制? 的全部内容, 来源链接: utcz.com/qa/411447.html

回到顶部