Tensorflow乘法的常数性能比tf.random低

我使用Tensorflow进行一些非DL计算,并且我遇到了一个我不明白的行为。我本身测试方阵的乘法:tf.matmul(a,a):Tensorflow乘法的常数性能比tf.random低

  1. 时tf.constant
  2. 创建矩阵在矩阵是在每次运行随机初始化

我的期望是第一种情况应该有一些开销来传输初始数据,100 MB(使用float32的5000x5000矩阵),但是由于每次运行时进行随机初始化,第二种情况的执行速度应稍慢。

但是,我看到的是,即使在同一会话中连续运行,常数的乘法速度也要慢很多。

代码

import tensorflow as tf 

import numpy as np

from timeit import timeit

import os

os.environ["TF_CPP_MIN_LOG_LEVEL"]="2" # nospam

SIZE = 5000

NUM_RUNS = 10

a = np.random.random((SIZE, SIZE))

_const_a = tf.constant(a, dtype=tf.float32, name="Const_A")

_mul_const_a = tf.matmul(_const_a, _const_a, name="Mul_Const")

_random_a = tf.random_uniform((SIZE, SIZE), dtype=tf.float32, name="Random_A")

_mul_random_a = tf.matmul(_random_a, _random_a, name="Mul_Random")

with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as s:

# Run once to make sure everything is initialised

s.run((_const_a, _mul_const_a, _random_a, _mul_random_a))

# timeit

print("TF with const\t", timeit(lambda: s.run((_mul_const_a.op)), number=NUM_RUNS))

print("TF with random\t", timeit(lambda: s.run((_mul_random_a.op)), number=NUM_RUNS))

输出

Device mapping: 

/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1

Random_A/sub: (Sub): /job:localhost/replica:0/task:0/device:GPU:0

Random_A/RandomUniform: (RandomUniform): /job:localhost/replica:0/task:0/device:GPU:0

Random_A/mul: (Mul): /job:localhost/replica:0/task:0/device:GPU:0

Random_A: (Add): /job:localhost/replica:0/task:0/device:GPU:0

Mul_Random: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0

Mul_Const: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0

Random_A/max: (Const): /job:localhost/replica:0/task:0/device:GPU:0

Random_A/min: (Const): /job:localhost/replica:0/task:0/device:GPU:0

Random_A/shape: (Const): /job:localhost/replica:0/task:0/device:GPU:0

Const_A: (Const): /job:localhost/replica:0/task:0/device:GPU:0

TF with const 2.9953213009994215

TF with random 0.513827863998813

回答:

YMMV,我获得关于我的适度K1100M相反的结果。

Device mapping: 

/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Quadro K1100M, pci bus id: 0000:01:00.0, compute capability: 3.0

Random_A/sub: (Sub): /job:localhost/replica:0/task:0/device:GPU:0

Random_A/RandomUniform: (RandomUniform): /job:localhost/replica:0/task:0/device:GPU:0

Random_A/mul: (Mul): /job:localhost/replica:0/task:0/device:GPU:0

Random_A: (Add): /job:localhost/replica:0/task:0/device:GPU:0

Mul_Random: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0

Mul_Const: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0

Random_A/max: (Const): /job:localhost/replica:0/task:0/device:GPU:0

Random_A/min: (Const): /job:localhost/replica:0/task:0/device:GPU:0

Random_A/shape: (Const): /job:localhost/replica:0/task:0/device:GPU:0

Const_A: (Const): /job:localhost/replica:0/task:0/device:GPU:0

TF with const 4.3167382130868175

TF with random 9.889055849542306

回答:

在tensorflow中第一次调用session.run()是不合理的代价。如果你想基准点记住要反复调用它。

虽然,在你的情况中,除非你禁用常量折叠,否则你可能几乎看不到在常量情况下花费的时间,因为你的图只会获取常量。

以上是 Tensorflow乘法的常数性能比tf.random低 的全部内容, 来源链接: utcz.com/qa/265569.html

回到顶部