我有以下示例代码要测试BasicRNNCell。我想获取其内部矩阵,以便我可以使用自己的代码计算 的值,output_res以newstate_res确保我可以重现 的值output_res。newstate_res
BasicRNNCell
output_res
newstate_res
在 tensorflow 源代码中,它显示output = new_state = act(W * input + U * state + B)。有人知道如何获取W和 吗U?(我试图访问cell._kernel,但不可用。)
output = new_state = act(W * input + U * state + B)
W
U
cell._kernel
$ cat ./main.py #!/usr/bin/env python # vim: set noexpandtab tabstop=2 shiftwidth=2 softtabstop=-1 fileencoding=utf-8: import tensorflow as tf import numpy as np batch_size = 4 vector_size = 3 inputs = tf.placeholder( tf.float32 , [batch_size, vector_size] ) num_units = 2 state = tf.zeros([batch_size, num_units], tf.float32) cell = tf.contrib.rnn.BasicRNNCell(num_units=num_units) output, newstate = cell(inputs = inputs, state = state) X = np.zeros([batch_size, vector_size]) #X = np.ones([batch_size, vector_size]) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) output_res, newstate_res = sess.run([output, newstate], feed_dict = {inputs: X}) print(output_res) print(newstate_res) sess.close() $ ./main.py [[ 0. 0.] [ 0. 0.] [ 0. 0.] [ 0. 0.]] [[ 0. 0.] [ 0. 0.] [ 0. 0.] [ 0. 0.]]
简短回答:您意识到自己在追求cell._kernel。以下是使用属性获取内核(和偏差)的一些代码variables,该属性在大多数 TensorFlow RNN 中都有:
variables
import tensorflow as tf import numpy as np batch_size = 4 vector_size = 3 inputs = tf.placeholder(tf.float32, [batch_size, vector_size]) num_units = 2 state = tf.zeros([batch_size, num_units], tf.float32) cell = tf.contrib.rnn.BasicRNNCell(num_units=num_units) output, newstate = cell(inputs=inputs, state=state) print("Output of cell.variables is a list of Tensors:") print(cell.variables) kernel, bias = cell.variables X = np.zeros([batch_size, vector_size]) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) output_, newstate_, k_, b_ = sess.run( [output, newstate, kernel, bias], feed_dict = {inputs: X}) print("Output:") print(output_) print("New State == Output:") print(newstate_) print("\nKernel:") print(k_) print("\nBias:") print(b_)
输出
Output of cell.variables is a list of Tensors: [<tf.Variable 'basic_rnn_cell/kernel:0' shape=(5, 2) dtype=float32_ref>, <tf.Variable 'basic_rnn_cell/bias:0' shape=(2,) dtype=float32_ref>] Output: [[ 0. 0.] [ 0. 0.] [ 0. 0.] [ 0. 0.]] New State == Output: [[ 0. 0.] [ 0. 0.] [ 0. 0.] [ 0. 0.]] Kernel: [[ 0.41417515 -0.64997244] [-0.40868729 -0.90995187] [ 0.62134564 -0.88962835] [-0.35878009 -0.25680023] [ 0.35606658 -0.83596271]] Bias: [ 0. 0.]
长答案:您还询问如何获得 W 和 U。让我复制实现call并讨论 W 和 U 在哪里。
call
def call(self, inputs, state): """Most basic RNN: output = new_state = act(W * input + U * state + B).""" gate_inputs = math_ops.matmul( array_ops.concat([inputs, state], 1), self._kernel) gate_inputs = nn_ops.bias_add(gate_inputs, self._bias) output = self._activation(gate_inputs) return output, output
看起来不像是W 和 U,但它们确实存在。本质上,vector_size内核的第一行是 W,num_units内核的下一行是 U。也许看看 LaTeX 中的元素数学会有所帮助:
vector_size
num_units
我使用m作为通用批次索引,v作为vector_size,n作为num_units,b作为batch_size。此外,[ ; ]表示连接。由于 TensorFlow 是批次为主的,因此实现通常使用右乘矩阵。
batch_size
由于这是一个非常基本的 RNN,output == new_state下一次迭代的“历史”只是当前迭代的输出。
output == new_state