tensorflow给予NaN的计算与稀疏张量

Z时代
2024-01-10
分类：问答

下面的片段是从一个相当大的一段代码梯度时，但希望我可以给所必需的所有信息：tensorflow给予NaN的计算与稀疏张量

Y2 = tf.matmul（Y1，ymask）

dist = tf.norm（ystar-y2，axis = 0）

y1和y2是128x30而ymask是30x30。 ystar是128x30。 dist是1x30。当ymask是单位矩阵时，一切正常。但是，当我将它设置为全零时，除了沿对角线的单个1（以便将除y2中的所有列之外的所有列都设置为零）之外，我使用tf获得dist相对于y2的梯度的nans。渐变（dist，[y2]）。 dist的具体值为[0,0,7.9,0，...]，所有的ystar-y2值都在第三列的范围（-1,1）附近，其他地方为零。

我很困惑，为什么在这里会出现数字问题，因为没有日志或分区，这是下溢吗？我在数学中错过了什么？

对于上下文，我这样做是为了尝试使用整个网络一次一个地训练y的各个维度。

更长的版本重现：

import tensorflow as tf 
import numpy as np 
import pandas as pd 
batchSize = 128 
eta = 0.8 
tasks = 30 
imageSize = 32**2 
groups = 3 
tasksPerGroup = 10 
trainDatapoints = 10000 
w = np.zeros([imageSize, groups * tasksPerGroup]) 
toyIndex = 0 
for toyLoop in range(groups): 
    m = np.ones([imageSize]) * np.random.randn(imageSize) 
    for taskLoop in range(tasksPerGroup): 
     w[:, toyIndex] = m * 0.1 * np.random.randn(1) 
     toyIndex += 1 
xRand = np.random.normal(0, 0.5, (trainDatapoints, imageSize)) 
taskLabels = np.matmul(xRand, w) + np.random.normal(0,0.5,(trainDatapoints, groups * tasksPerGroup)) 
DF = np.concatenate((xRand, taskLabels), axis=1) 
trainDF = pd.DataFrame(DF[:trainDatapoints, ]) 
# define graph variables 
x = tf.placeholder(tf.float32, [None, imageSize]) 
W = tf.Variable(tf.zeros([imageSize, tasks])) 
b = tf.Variable(tf.zeros([tasks])) 
ystar = tf.placeholder(tf.float32, [None, tasks]) 
ymask = tf.placeholder(tf.float32, [tasks, tasks]) 
dataLength = tf.cast(tf.shape(ystar)[0],dtype=tf.float32) 
y1 = tf.matmul(x, W) + b 
y2 = tf.matmul(y1,ymask) 
dist = tf.norm(ystar-y2,axis=0) 
mse = tf.reciprocal(dataLength) * tf.reduce_mean(tf.square(dist)) 
grads = tf.gradients(dist, [y2]) 
trainStep = tf.train.GradientDescentOptimizer(eta).minimize(mse) 
# build graph 
init = tf.global_variables_initializer() 
sess = tf.Session() 
sess.run(init) 
randTask = np.random.randint(0, 9) 
ymaskIn = np.zeros([tasks, tasks]) 
ymaskIn[randTask, randTask] = 1 
batch = trainDF.sample(batchSize) 
batch_xs = batch.iloc[:, :imageSize] 
batch_ys = np.zeros([batchSize, tasks]) 
batch_ys[:, randTask] = batch.iloc[:, imageSize + randTask] 
gradOut = sess.run(grads, feed_dict={x: batch_xs, ystar: batch_ys, ymask: ymaskIn}) 
sess.run(trainStep, feed_dict={x: batch_xs, ystar: batch_ys, ymask:ymaskIn})

回答：

这里是一个非常简单的复制：

import tensorflow as tf 
with tf.Graph().as_default(): 
    y = tf.zeros(shape=[1], dtype=tf.float32) 
    dist = tf.norm(y,axis=0) 
    (grad,) = tf.gradients(dist, [y]) 
    with tf.Session(): 
    print(grad.eval())

打印：

[ nan]

的问题是，tf.norm计算sum(x**2)**0.5。梯度为x/sum(x**2) ** 0.5（例如参见https://math.stackexchange.com/a/84333），所以当sum(x**2)为零时，我们除以零。

根据特殊情况，没有太多的事情要做：梯度为x接近全零取决于接近的方向。例如，如果x是单元素向量，则作为x接近0的限制可以是1或-1，具体取决于它接近的零的哪一侧。

因此，在解决方案方面，你可以只添加一个小epsilon：

import tensorflow as tf 
def safe_norm(x, epsilon=1e-12, axis=None): 
    return tf.sqrt(tf.reduce_sum(x ** 2, axis=axis) + epsilon) 
with tf.Graph().as_default(): 
    y = tf.constant([0.]) 
    dist = safe_norm(y,axis=0) 
    (grad,) = tf.gradients(dist, [y]) 
    with tf.Session(): 
    print(grad.eval())

打印：

[ 0.]

请注意，这不是真正的欧几里得范。只要输入比epsilon大得多，这是一个很好的近似值。

以上是 tensorflow给予NaN的计算与稀疏张量的全部内容，来源链接： utcz.com/qa/258045.html

tensorflow给予NaN的计算与稀疏张量

回答：

其他人也看了：