
RMSProp 代表均方根传播,采用梯度下降算法进行优化。它属于自适应优化算法的范畴。这些自适应优化器带来了更快、更高效的学习时间。我们将通过首先查看 RProp,了解其缺陷,然后了解 RMSProp 如何解决这些缺陷来研究 RMSProp。


反向传播算法之父 Geoffrey Hinton 首先提出了 RMSProp。RMSProp 的工作原理是在每次迭代时改变步长,以提供更好的结果。它抑制了振荡并更快地达到最小值。它使用梯度的衰减移动平均值来忘记较早的梯度,同时在计算中优先考虑较新的梯度。

Rprop 到 RMSprop

RProp,即弹性传播,是为了解决梯度大小不同的问题而引入的。它引入了自适应学习率,通过查看前两个梯度符号来解决这个问题。RProp 的工作原理是分别比较前一个梯度和当前梯度的符号并调整学习率。下面的伪代码将提供更好的理解。

for i in range(n_iter):
if dW[t-1]*dW[t] >0:
	step_size = min(step_size * increment_factor, max_step_size)
elif dW[t-1]*dW[t] <0:
	step_size = max(step_size * decrement_factor, min_step_size)
w[t] = w[t - 1] - sign(dw[t]) * step_size

w[t] -> 权重

dw -> 相对于权重的梯度

如果先前和当前梯度具有相同的符号,则学习率会加快(乘以增量因子),通常是介于 1 和 2 之间的数字。如果符号不同,则学习率会减慢递减因子,通常为 0.5。

RProp 的问题在于它不能很好地用于小批量,因为它不符合小批量梯度下降的核心思想。当学习率足够低时,它会使用连续小批量梯度的平均值。这不会在 RProp 中应用。例如,如果有 9 个量级为 +0.1 的 +ve 梯度,而第 10 个梯度为 -0.9,理想情况下,我们希望对梯度进行平均并相互抵消。但在 RProp 中,梯度递增 9 倍,递减 1 次,从而获得更高值的梯度。

因此,理想情况下,我们需要一种带有移动平均滤波器的技术来克服 RProp 的问题,同时仍然保持 RProp 的鲁棒性和高效性。这就引出了 RMSProp。

均方根传播 (RMSProp)


上面的等式用于 RMSProp。在第一个等式中,Vdw 使用平均值计算步长。变量 Vdw 包含充当移动平均线的先前梯度的信息。参数β控制在第一项中必须考虑的先前参数的数量。适用于大多数实际应用β = 0.9.值 0.9 表示大致考虑前 10 项。在第二项中,通过考虑当前梯度的平方来减少振荡。直观地说,由于 Vdw 值是在学习率项中除以的,因此如果 Vdw 较低,则学习率增加,反之亦然。因此,如果梯度很高,这种技术会使其平滑,从而减少振荡。

使用 RMSProp 进行梯度下降

梯度下降是一种用于训练机器学习模型的优化算法。它迭代检查梯度以找到成本函数的最小值。RMSProp 是梯度下降的一种改进形式,它使用衰减的移动平均线,而不仅仅是当前值。



目标函数 =

为简单起见,我们将在 -1 和 +1 之间选择示例的边界。

采样率选择为 0.1,获得的值绘制在 3D 模型中并绘制为等值线图。

from the numpy import arange,meshgrid
import matplotlib.pyplot as plt
# a simple second degree function
def function(x, y):
 return x**2.0 + y**2.0
# inputs sampled at each 0.1 interval
xaxis = arange(-1, 1, 0.1)
yaxis = arange(-1, 1, 0.1)
# meshgrid from created x and y axis
x, y = mesh grid(x-axis, y-axis)
# compute gradients wrt w
results = function(x, y)

#create a 3d plot
figure = plt.figure()
axis = figure.gca(projection='3d')
axis.plot_surface(x, y, results, cmap='jet')
# plot a filled contour with 100 levels
plt.contourf(x, y, results, levels=100, cmap='jet')

使用 RMSProp 进行梯度下降优化


from math import sqrt
from numpy import asarray,arange,meshgrid
from numpy.random import rand, seed
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# target function is a simple second-degree equation
def objective_function(x, y):
	return x**2.0 + y**2.0

# function to find the derivative of target function x**2 +y**2
def derivative_function(x, y):
	return array([x * 2.0, y * 2.0])

接下来,我们定义 RMSProp 模型。但首先,我们初始化起点和平方平均值。

# RMSProp algorithm
def RMSProp(objective_function, derivative_function, values_range, n_iterations, step_size, Beta):
	# list of all solution points
	all_solutions_list = list()
	# initial point generation within the range
	current_solution_point = values_range[:, 0] + rand(len(values_range)) * (values_range[:, 1] - values_range[:, 0])
	# squared gradients average
	squared_gradient_avg = [0.0 for _ in range(values_range.shape[0])]

我们创建一个 for 循环来定义迭代次数。然后,我们计算梯度并创建另一个 for 循环来计算每个变量的梯度平均值的平方。

	#Gradient Descent Algorithm looped n times
	for n in range(n_iterations):
		gradient = derivative_function(current_solution_point[0], current_solution_point[1])
		# loop to calculate an average of the squared gradients
		for i in range(gradient.shape[0]):
			squared_gradient = gradient[i]**2.0
			squared_gradient_avg[i] = (squared_gradient_avg[i] * Beta) + (squared_gradient * (1.0-Beta))

创建另一个循环来更新每个变量的学习率 (alpha),并更新相应的权重。

# update solution point with squared gradient average
		updated_solution = list()
		for j in range(current_solution_point.shape[0]):
			# Learning rate calculation
			alpha = step_size / (1e-8 + sqrt(squared_gradient_avg[j]))
			# update solution point
			updated_value = current_solution_point[i] - alpha * gradient[j]


		# store updated solution
		current_solution_point = asarray(updated_solution)
		# value of the function at updated solution point
		solution_eval_value = objective_function(current_solution_point[0], current_solution_point[1])

		print('iteration: %d  |  (x,y) = (%s)  |  eval_value = %.5f' % (n, current_solution_point, solution_eval_value))
	return all_solutions_list

我们初始化随机种子、边界、beta 和步长。然后,将这些值传递给 RMSProp 模型。

# seed random number generator with any value
# min and max bounds of the input
values_range = asarray([[-1.0, 1.0], [-1.0, 1.0]])
# number of iterations
n_iterations = 50
# step size 
step_size = 0.01
# momentum for rmsprop, 0.99 works for most use cases
Beta = 0.99
# RMSProp gradient Descent
solutions = RMSProp(objective_function, derivative_function, values_range, n_iterations, step_size, Beta)



iteration: 0  |  (x,y) = ([0.54064896 0.340649  ])  |  eval_value = 0.40834
iteration: 1  |  (x,y) = ([0.24501018 0.27929536])  |  eval_value = 0.13804
iteration: 2  |  (x,y) = ([0.23935992 0.23417693])  |  eval_value = 0.11213
iteration: 3  |  (x,y) = ([0.19767195 0.19863849])  |  eval_value = 0.07853
iteration: 4  |  (x,y) = ([0.16964129 0.16964301])  |  eval_value = 0.05756
iteration: 5  |  (x,y) = ([0.14537973 0.145492  ])  |  eval_value = 0.04230
iteration: 6  |  (x,y) = ([0.12503593 0.12511178])  |  eval_value = 0.03129
iteration: 7  |  (x,y) = ([0.10769972 0.10776512])  |  eval_value = 0.02321
iteration: 8  |  (x,y) = ([0.09286006 0.09291479])  |  eval_value = 0.01726
iteration: 9  |  (x,y) = ([0.08010512 0.08015162])  |  eval_value = 0.01284
iteration: 10  |  (x,y) = ([0.06911364 0.06915332])  |  eval_value = 0.00956
iteration: 11  |  (x,y) = ([0.05962546 0.05965946])  |  eval_value = 0.00711
iteration: 12  |  (x,y) = ([0.05142626 0.05145547])  |  eval_value = 0.00529
iteration: 13  |  (x,y) = ([0.04433676 0.04436191])  |  eval_value = 0.00393
iteration: 14  |  (x,y) = ([0.03820535 0.03822702])  |  eval_value = 0.00292
iteration: 15  |  (x,y) = ([0.0329027  0.03292139])  |  eval_value = 0.00217
iteration: 16  |  (x,y) = ([0.02831784 0.02833396])  |  eval_value = 0.00160
iteration: 17  |  (x,y) = ([0.02435509 0.02436899])  |  eval_value = 0.00119
iteration: 18  |  (x,y) = ([0.02093171 0.0209437 ])  |  eval_value = 0.00088
iteration: 19  |  (x,y) = ([0.01797599 0.01798633])  |  eval_value = 0.00065
iteration: 20  |  (x,y) = ([0.01542569 0.0154346 ])  |  eval_value = 0.00048
iteration: 21  |  (x,y) = ([0.01322671 0.01323438])  |  eval_value = 0.00035
iteration: 22  |  (x,y) = ([0.01133204 0.01133864])  |  eval_value = 0.00026
iteration: 23  |  (x,y) = ([0.00970081 0.00970649])  |  eval_value = 0.00019
iteration: 24  |  (x,y) = ([0.0082975  0.00830238])  |  eval_value = 0.00014
iteration: 25  |  (x,y) = ([0.00709123 0.00709542])  |  eval_value = 0.00010
iteration: 26  |  (x,y) = ([0.00605519 0.00605879])  |  eval_value = 0.00007
iteration: 27  |  (x,y) = ([0.00516609 0.00516918])  |  eval_value = 0.00005
iteration: 28  |  (x,y) = ([0.00440374 0.00440639])  |  eval_value = 0.00004
iteration: 29  |  (x,y) = ([0.00375063 0.0037529 ])  |  eval_value = 0.00003
iteration: 30  |  (x,y) = ([0.00319159 0.00319353])  |  eval_value = 0.00002
iteration: 31  |  (x,y) = ([0.00271348 0.00271514])  |  eval_value = 0.00001
iteration: 32  |  (x,y) = ([0.00230496 0.00230637])  |  eval_value = 0.00001
iteration: 33  |  (x,y) = ([0.00195619 0.00195739])  |  eval_value = 0.00001
iteration: 34  |  (x,y) = ([0.0016587  0.00165973])  |  eval_value = 0.00001
iteration: 35  |  (x,y) = ([0.00140518 0.00140606])  |  eval_value = 0.00000
iteration: 36  |  (x,y) = ([0.00118933 0.00119008])  |  eval_value = 0.00000
iteration: 37  |  (x,y) = ([0.00100572 0.00100635])  |  eval_value = 0.00000
iteration: 38  |  (x,y) = ([0.00084967 0.00085021])  |  eval_value = 0.00000
iteration: 39  |  (x,y) = ([0.00071717 0.00071763])  |  eval_value = 0.00000
iteration: 40  |  (x,y) = ([0.00060477 0.00060516])  |  eval_value = 0.00000
iteration: 41  |  (x,y) = ([0.00050951 0.00050984])  |  eval_value = 0.00000
iteration: 42  |  (x,y) = ([0.00042885 0.00042912])  |  eval_value = 0.00000
iteration: 43  |  (x,y) = ([0.00036062 0.00036085])  |  eval_value = 0.00000
iteration: 44  |  (x,y) = ([0.00030295 0.00030315])  |  eval_value = 0.00000
iteration: 45  |  (x,y) = ([0.00025426 0.00025443])  |  eval_value = 0.00000
iteration: 46  |  (x,y) = ([0.00021319 0.00021333])  |  eval_value = 0.00000
iteration: 47  |  (x,y) = ([0.00017858 0.0001787 ])  |  eval_value = 0.00000
iteration: 48  |  (x,y) = ([0.00014945 0.00014954])  |  eval_value = 0.00000
iteration: 49  |  (x,y) = ([0.00012494 0.00012502])  |  eval_value = 0.00000

在我们的例子中,我们可以看到第 35 次迭代达到了最小值。

RMSProp 的可视化


x_axis = arange(values_range[0,0], values_range[0,1], 0.1)
y_axis = arange(values_range[1,0], values_range[1,1], 0.1)

x, y = meshgrid(x_axis, y_axis)
# calculate results
results = objective_function(x, y)
# create a filled contour plot
plt.contourf(x, y, results, levels=80, cmap='jet')

all_solution_values = asarray(solutions)
# plot to trace solution points with each iteration
plt.plot(all_solution_values[:, 0], all_solution_values[:, 1], '.-', color='w')


下面给出了 RMSProp 的完整代码。

  • RMSProp 是一种自适应学习率优化算法。
  • RMSProp 抑制振荡,从而更快地达到最小值。
  • RMSprop 充当移动平均筛选器,在更新学习率时考虑以前的梯度
  • RMSprop 解决了 RProp 无法处理小批量的问题。

