我正在尝试实现SGD功能,在caffe中手动更新python中的权重,而不是使用solver.step()函数。目标是在执行solver.step()之后匹配权重更新,并手动更新权重。
设置如下:使用MNIST数据。将solver.prototxt中的随机种子设置为:random_seed: 52。确保momentum: 0.0和,base_lr: 0.01,lr_policy: "fixed"。以上是为了,我可以简单地实现SGD更新方程(没有出动量,正则化等)。方程很简单: W_t+1 = W_t - mu * W_t_diff
以下是两项测试:
Test1:使用caffe的前进()和后向()来计算前向传播和后向传播。对于包含权重的每一层,我都这样做:
for k in weight_layer_idx:
solver.net.layers[k].blobs[0].diff[...] *= lr # weights
solver.net.layers[k].blobs[1].diff[...] *= lr # biases接下来,将权重/偏差更新为:
solver.net.layers[k].blobs[0].data[...] -= solver.net.layers[k].blobs[0].diff
solver.net.layers[k].blobs[1].data[...] -= solver.net.layers[k].blobs[1].diff我运行了5次迭代。
Test2:运行caffe的solver.step(5)。
现在,我期望两个测试在两次迭代之后产生完全相同的权重。
我在每次测试后保存权重值,并通过两个测试计算出权重向量之间的范数差,我发现它们并不精确。有什么地方我可能遗漏了什么吗?
以下是供参考的全部代码:
import caffe
caffe.set_device(0)
caffe.set_mode_gpu()
import numpy as np
niter = 5
solver = None
solver = caffe.SGDSolver('solver.prototxt')
# Automatic SGD: TEST2
solver.step(niter)
# save the weights to compare later
w_solver_step = copy(solver.net.layers[1].blobs[0].data.astype('float64'))
b_solver_step = copy(solver.net.layers[1].blobs[1].data.astype('float64'))
# Manual SGD: TEST1
solver = None
solver = caffe.SGDSolver('solver.prototxt')
lr = 0.01
momentum = 0.
# Get layer types
layer_types = []
for ll in solver.net.layers:
layer_types.append(ll.type)
# Get the indices of layers that have weights in them
weight_layer_idx = [idx for idx,l in enumerate(layer_types) if 'Convolution' in l or 'InnerProduct' in l]
for it in range(1, niter+1):
solver.net.forward() # fprop
solver.net.backward() # bprop
for k in weight_layer_idx:
solver.net.layers[k].blobs[0].diff[...] *= lr
solver.net.layers[k].blobs[1].diff[...] *= lr
solver.net.layers[k].blobs[0].data[...] -= solver.net.layers[k].blobs[0].diff
solver.net.layers[k].blobs[1].data[...] -= solver.net.layers[k].blobs[1].diff
# save the weights to compare later
w_fwdbwd_update = copy(solver.net.layers[1].blobs[0].data.astype('float64'))
b_fwdbwd_update = copy(solver.net.layers[1].blobs[1].data.astype('float64'))
# Compare
print "after iter", niter, ": weight diff: ", np.linalg.norm(w_solver_step - w_fwdbwd_update), "and bias diff:", np.linalg.norm(b_solver_step - b_fwdbwd_update)将权重与两个测试进行比较的最后一行生成:
在after iter 5 : weight diff: 0.000203027766144 and bias diff: 1.78390789051e-05中,正如我所期望的那样,这种差异为0.0。
有什么想法吗?
发布于 2016-11-20 20:17:08
您几乎是正确的,您只需将差异设置为零后,每次更新。Caffe不会自动地为您提供实现批处理累积的机会(为一次权重更新在多个批次上增加渐变,如果内存不够大以满足所需的批处理大小,这可能会有所帮助)。
另一个可能的问题是cudnn的使用,它的卷积实现是不确定的(或者如何设置它在caffe中被精确地使用)。一般来说,这应该是没有问题的,但在您的情况下,它造成的结果略有不同的结果,每次,因此不同的权重。如果您用cudnn编译caffe,您可以简单地将模式设置为cpu,以防止在测试时发生这种情况。
此外,求解器参数对权重更新也有影响。正如你所指出的,你应该知道:
在网络中,确保不要使用学习速率乘数,通常偏差的学习速度是权重的两倍,但这不是您实现的行为。因此,您需要确保将它们设置为层定义中的一个:
param {
lr_mult: 1 # weight lr multiplier
}
param {
lr_mult: 1 # bias lr multiplier
}最后但并非最不重要的一点是,这里有一个例子,说明您的代码在动量、权重衰减和lr_mult的情况下会是怎样的。在CPU模式下,这将产生预期的输出(没有差异):
import caffe
caffe.set_device(0)
caffe.set_mode_cpu()
import numpy as np
niter = 5
solver = None
solver = caffe.SGDSolver('solver.prototxt')
# Automatic SGD: TEST2
solver.step(niter)
# save the weights to compare later
w_solver_step = solver.net.layers[1].blobs[0].data.copy()
b_solver_step = solver.net.layers[1].blobs[1].data.copy()
# Manual SGD: TEST1
solver = None
solver = caffe.SGDSolver('solver.prototxt')
base_lr = 0.01
momentum = 0.9
weight_decay = 0.0005
lr_w_mult = 1
lr_b_mult = 2
momentum_hist = {}
for layer in solver.net.params:
m_w = np.zeros_like(solver.net.params[layer][0].data)
m_b = np.zeros_like(solver.net.params[layer][1].data)
momentum_hist[layer] = [m_w, m_b]
for i in range(niter):
solver.net.forward()
solver.net.backward()
for layer in solver.net.params:
momentum_hist[layer][0] = momentum_hist[layer][0] * momentum + (solver.net.params[layer][0].diff + weight_decay *
solver.net.params[layer][0].data) * base_lr * lr_w_mult
momentum_hist[layer][1] = momentum_hist[layer][1] * momentum + (solver.net.params[layer][1].diff + weight_decay *
solver.net.params[layer][1].data) * base_lr * lr_b_mult
solver.net.params[layer][0].data[...] -= momentum_hist[layer][0]
solver.net.params[layer][1].data[...] -= momentum_hist[layer][1]
solver.net.params[layer][0].diff[...] *= 0
solver.net.params[layer][1].diff[...] *= 0
# save the weights to compare later
w_fwdbwd_update = solver.net.layers[1].blobs[0].data.copy()
b_fwdbwd_update = solver.net.layers[1].blobs[1].data.copy()
# Compare
print "after iter", niter, ": weight diff: ", np.linalg.norm(w_solver_step - w_fwdbwd_update), "and bias diff:", np.linalg.norm(b_solver_step - b_fwdbwd_update)https://stackoverflow.com/questions/36459266
复制相似问题