LM(Levenberg-Marquadrdt )算法在MATLAB中的实现及实例
LM算法简介考虑如下非线性最小二乘问题minf(x)=12∥r(x)∥2,\min \quad f(x) = \frac{1}{2}\Vert r(x)\Vert^2, minf(x)=21∥r(x)∥2,其中残差向量 r:Rn→Rr:\mathbb{R}^n \to \mathbb{R}r:Rn→R 为r(x)=(r1(x),r2(x),⋯ ,rm(x))T.r(x) = \big(r_1(
LM算法简介
考虑如下非线性最小二乘问题
min
f
(
x
)
=
1
2
∥
r
(
x
)
∥
2
,
\min \quad f(x) = \frac{1}{2}\Vert r(x)\Vert^2,
minf(x)=21∥r(x)∥2,
其中残差向量
r
:
R
n
→
R
r:\mathbb{R}^n \to \mathbb{R}
r:Rn→R 为
r
(
x
)
=
(
r
1
(
x
)
,
r
2
(
x
)
,
⋯
,
r
m
(
x
)
)
T
.
r(x) = \big(r_1(x),r_2(x),\cdots,r_m(x)\big)^T.
r(x)=(r1(x),r2(x),⋯,rm(x))T.
r
r
r 的雅可比矩阵
J
∈
R
m
×
n
J\in \mathbb{R}^{m\times n}
J∈Rm×n 为
J
(
x
)
=
[
∂
r
j
∂
x
i
]
=
[
∇
r
1
(
x
)
T
∇
r
2
(
x
)
T
⋯
∇
r
m
(
x
)
T
]
J(x)=\left[\frac{\partial r_j}{\partial x_i}\right]=\left[\begin{array}{c} \nabla r_1(x)^T\\ \nabla r_2(x)^T\\ \cdots\\ \nabla r_m(x)^T \end{array}\right]
J(x)=[∂xi∂rj]=⎣⎢⎢⎡∇r1(x)T∇r2(x)T⋯∇rm(x)T⎦⎥⎥⎤
简单计算有
∇
f
(
x
)
=
J
(
x
)
T
r
(
x
)
,
∇
2
f
(
x
)
=
J
T
(
x
)
J
(
x
)
+
∑
j
=
1
m
r
j
(
x
)
∇
2
r
j
(
x
)
.
\nabla f(x) = \textcolor{blue}{J(x)^Tr(x)},\\ \nabla ^2 f(x) = \textcolor{blue}{J^T(x)J(x)} + \sum_{j=1}^{m}r_j(x)\nabla^2 r_j(x).
∇f(x)=J(x)Tr(x),∇2f(x)=JT(x)J(x)+j=1∑mrj(x)∇2rj(x).
在实际计算中, 我们丢掉
∇
2
f
(
x
)
\nabla ^2f(x)
∇2f(x) 中的第二项, 也就是说
∇
2
f
(
x
)
≈
J
T
(
x
)
J
(
x
)
.
\nabla ^2 f(x) \approx J^T(x)J(x).
∇2f(x)≈JT(x)J(x).
利用标准的 Newton 法, 搜索方向
p
p
p 满足
∇
2
f
(
x
k
)
p
=
−
∇
f
(
x
k
)
.
\nabla^2f(x_k)p = -\nabla f(x_k).
∇2f(xk)p=−∇f(xk). 即求解
J
k
T
J
k
p
=
−
J
k
T
r
k
.
J_k^TJ_kp=-J_k^Tr_k.
JkTJkp=−JkTrk.
上述问题等价于求解子问题
min
p
1
2
∥
J
k
p
+
r
k
∥
2
.
\min \limits_{p}\quad \frac{1}{2}\Vert J_kp+r_k\Vert^2.
pmin21∥Jkp+rk∥2.
我们采用信赖域框架来解决此问题, 即
min
p
1
2
∥
J
k
p
+
r
k
∥
2
,
s
.
t
.
∥
p
∥
≤
Δ
k
,
\min \limits_{p} \quad \frac{1}{2}\Vert J_kp+r_k\Vert ^2,\quad s.t. \; \Vert p\Vert \leq \Delta_k,
pmin21∥Jkp+rk∥2,s.t.∥p∥≤Δk,
简单证明有上式等价于:
(
J
T
J
+
λ
I
)
p
L
M
=
−
J
T
r
,
λ
(
Δ
−
∥
p
L
M
)
=
0.
(J^TJ+\lambda I)p^{LM} = - J^Tr,\\ \lambda (\Delta -\Vert p^{LM})=0.
(JTJ+λI)pLM=−JTr,λ(Δ−∥pLM)=0.
求解此子问题的具体算法为
需要注意的是, 实际计算中一般取2$\sim$3 次迭代即可, 不需要迭代至收敛.
**(也可以不使用Cholesky分解, 使用SVD,QR等方法)
**
信赖域的整体算法为
注意:其中1.3换成之前给出的算法.
LM算法在MATLAB中的实现
问题描述
Write a code to use the Levenberg-Marquadrdt algorithm to fit an exponential function of the form
f
(
x
;
θ
)
=
θ
1
e
θ
2
x
f(x;\theta)=\theta_1e^{\theta_2x}
f(x;θ)=θ1eθ2x to the data
0
,
1
,
2
,
3
,
4
,
5
,
5.2
,
4.5
,
2.7
,
2.5
,
2.1
,
1.9.
0,1,2,3,4,5,\\ 5.2, 4.5, 2.7, 2.5, 2.1, 1.9.
0,1,2,3,4,5,5.2,4.5,2.7,2.5,2.1,1.9.
(The first list gives
x
(
i
)
x^{(i)}
x(i); the second list gives
y
(
i
)
.
y^{(i)}.
y(i).) Plot your model
f
^
(
x
;
θ
)
\hat{f}(x;\theta)
f^(x;θ) versus
x
x
x, along with the data points.
问题分析
记
x
=
(
0
,
1
,
2
,
3
,
4
,
5
)
T
∈
R
6
,
y
=
(
5.2
,
4.5
,
2.7
,
2.5
,
2.1
,
1.9
)
T
∈
R
6
,
f
(
x
;
θ
)
=
θ
1
e
θ
2
x
,
x = (0,1,2,3,4,5)^T\in \mathbb{R}^6, y = (5.2, 4.5, 2.7, 2.5, 2.1, 1.9)^T\in\mathbb{R}^6,f(x;\theta) = \theta_1e^{\theta_2 x},
x=(0,1,2,3,4,5)T∈R6,y=(5.2,4.5,2.7,2.5,2.1,1.9)T∈R6,f(x;θ)=θ1eθ2x, 优化问题为
min
θ
1
,
θ
2
1
2
∥
f
(
x
;
θ
)
−
y
∥
2
,
\min\limits_{\theta_1,\theta_2}\quad \frac{1}{2}\Vert f(x;\theta) - y\Vert^2,
θ1,θ2min21∥f(x;θ)−y∥2,
即
r
(
θ
)
=
f
(
x
;
θ
)
−
y
,
r
i
(
θ
)
=
f
(
x
i
;
θ
)
−
y
i
,
∀
i
.
r(\theta)=f(x;\theta)-y, \quad r_i(\theta) = f(x_i;\theta) - y_i, \forall\; i.
r(θ)=f(x;θ)−y,ri(θ)=f(xi;θ)−yi,∀i.
计算
r
r
r 的 Jacobi 矩阵
J
∈
R
6
×
2
,
J\in \mathbb{R}^{6\times 2},
J∈R6×2, 有
J
(
θ
)
=
[
∂
r
j
∂
θ
i
]
=
[
∇
r
1
(
θ
)
T
∇
r
2
(
θ
)
T
⋯
∇
r
6
(
θ
)
T
]
=
[
e
θ
2
x
1
θ
1
x
1
e
θ
2
x
1
e
θ
2
x
2
θ
1
x
2
e
θ
2
x
2
⋯
⋯
e
θ
2
x
6
θ
1
x
6
e
θ
2
x
6
]
J(\theta)=\left[\frac{\partial r_j}{\partial \theta_i}\right]=\left[ \nabla r_1(\theta)^T\\ \nabla r_2(\theta)^T\\ \cdots\\ \nabla r_6(\theta)^T \right] = \left[\begin{array}{cc} e^{\theta_2 x_1}&\theta_1 x_1 e^{\theta_2 x_1}\\ e^{\theta_2 x_2}&\theta_1 x_2 e^{\theta_2 x_2}\\ \cdots&\cdots\\ e^{\theta_2 x_6}&\theta_1 x_6 e^{\theta_2 x_6} \end{array}\right]
J(θ)=[∂θi∂rj]=[∇r1(θ)T∇r2(θ)T⋯∇r6(θ)T]=⎣⎢⎢⎡eθ2x1eθ2x2⋯eθ2x6θ1x1eθ2x1θ1x2eθ2x2⋯θ1x6eθ2x6⎦⎥⎥⎤
问题求解
我们采用 Trust Region Method 来求解此问题我们取
θ
i
n
i
t
=
(
1
,
1
)
T
,
Δ
i
n
i
t
=
1
,
λ
(
0
)
=
0.01
,
\theta_{init} = (1,1)^T, \Delta_{init} = 1, \lambda^{(0)}=0.01,
θinit=(1,1)T,Δinit=1,λ(0)=0.01, Algorithm1 的迭代次数为 3, 总的迭代次数为 100,
θ
1
=
5.2109
,
θ
2
=
−
0.2341.
\textcolor{red}{\theta_1 = 5.2109, \quad \theta_2 =-0.2341.}
θ1=5.2109,θ2=−0.2341.
拟合的函数图像如图所示:
求解子问题的程序为
function p = Subproblem(J, r, lambda, delta)
% x is the solution of subproblem, lambda is the value after updating
for j = 1:3
[pl,R] = cholesky(J'*J+lambda*eye(2), -J'*r);
ql = huidaifa(R', pl);
diff = (norm(pl)/norm(ql))^2 * (norm(pl)-delta) / delta;
lambda = lambda + diff;
end
[p, ~] = cholesky(J'*J+lambda*eye(2), -J'*r);
end
Cholesky分解程序为
function [x,L]=cholesky(A,b)
%用平方根法解对称正定矩阵Ax=b
%A=LL'
%Ly=b;
%L'x=y
[m,n]=size(A);
if m~=n
fprintf('Matrix is not a square matrix');
return;
end
for k=1:n
A(k,k)=sqrt(A(k,k));
A(k+1:n,k)=A(k+1:n,k)/A(k,k);
for j=k+1:n
A(j:n,j)=A(j:n,j)-A(j:n,k)*A(j,k);
end
end
L = tril(A);
%step2
for j=1:n-1
b(j)=b(j)/A(j,j);
b(j+1:n)=b(j+1:n)-b(j)*A(j+1:n,j);
end
b(n)=b(n)/A(n,n);
%step3
for j=n:-1:2
b(j)=b(j)/A(j,j);
b(1:j-1)=b(1:j-1)-b(j)*A(j,1:j-1)';
end
b(1)=b(1)/A(1,1);
x=b;
% plot(x)
end
解上三角矩阵程序为
function x=huidaifa(U,y)
%% 解上三角矩阵Ux=y
[~,n]=size(U);
for j=n:-1:2
y(j,1)=y(j,1)/U(j,j);
y(1:j-1,1)=y(1:j-1,1)-y(j,1)*U(1:j-1,j);
end
y(1,1)=y(1,1)/U(1,1);
x=y;
end
主程序为
function theta = LM(x)
%% Levenberg-Marquadrdt ALgorithm to solve Least-Square problem
% x: input vector
% x = [0, 1, 2, 3, 4, 5;
% 5.2, 4.5, 2.7, 2.5, 2.1, 1.9]
% f(x) = theta_1 exp(theta_2 x[1,:]), where theta_1, theta2 is parameters;
% We try to use LM method to learn the parameters theta_1, theta_2;
if nargin < 1
x = [0, 1, 2, 3, 4, 5; 5.2, 4.5, 2.7, 2.5, 2.1, 1.9];
theta_init = [1;1]; % initial value
lambda_init = 0.01; % initialize the parameter lambda of subproblem
delta_init = 1; % the radius of trust region
delta_max = 4;
end
theta = theta_init;
delta = delta_init;
maxiter = 100;
eta1 = 0.25;
eta2 = 0.75;
eta = 0;
x_input = x(1, :)';
x_output = x(2, :)';
% Calculate jacobi matrix
J = [exp(theta(2)*x_input), theta(1) * x_input .* exp(theta(2)*x_input)];
x_pred = theta(1) * exp(theta(2)*x_input);
r = x_pred - x_output;
i = 0;
eps = 1e-10;
p = [1; 1];
theta_best = theta_init;
loss_best = inf;
while i < maxiter
if norm(p) < eps
disp(i);
break;
end
p = Subproblem(J, r, lambda_init, delta);
% evalute rho
theta_tmp = theta + p;
x_pred_tmp = theta_tmp(1) * exp(theta_tmp(2)*x_input);
f_pred = sum((x_pred_tmp - x_output).^2);
x_pred_tmp2 = theta(1) * exp(theta(2)*x_input);
f_pred2 = sum((x_pred_tmp2 - x_output).^2);
rho = (f_pred-f_pred2) / (p'*J'*r + 0.5*p'*(J'*J)*p);
% update delta
if rho < eta1
delta = eta1*delta;
elseif rho>eta2 && abs(norm(p)-delta)<1e-8
delta = min(2*delta,delta_max);
else
% do nothing;
end
if rho > eta
theta = theta_tmp;
J = [exp(theta(2)*x_input), theta(1) * x_input .* exp(theta(2)*x_input)];
x_pred = x_pred_tmp;
r = x_pred - x_output;
else
% do nothing;
end
if f_pred < loss_best
theta_best = theta_tmp;
loss_best = f_pred;
end
i = i + 1;
end
theta = theta_best;
x_pred = theta(1)*exp(theta(2)*x_input);
plot(x_input, x_pred, 'r*', x_input, x_output, 'b*');
hold on;
xx = 0:0.01:5;
yy = theta(1) * exp(theta(2)*xx);
plot(xx,yy,'r--');
xlabel('x')
ylabel('y')
xlim([-0.2,5.2]);
ylim([-0.2,6.2]);
legend('pred', 'true');
title('Result');
axis on;
set(gca,'xtick',0:1:5);
set(gca,'xticklabel',{0,1,2,3,4,5});
set(gca,'ytick',0:1:6);
set(gca,'yticklabel',{0,1,2,3,4,5,6});
fprintf('loss:%.3f', loss_best);
end
开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!
更多推荐
所有评论(0)