LM(Levenberg-Marquadrdt )算法在MATLAB中的实现及实例

LM算法简介考虑如下非线性最小二乘问题min⁡f(x)=12∥r(x)∥2,\min \quad f(x) = \frac{1}{2}\Vert r(x)\Vert^2, minf(x)=21∥r(x)∥2,其中残差向量 r:Rn→Rr:\mathbb{R}^n \to \mathbb{R}r:Rn→R 为r(x)=(r1(x),r2(x),⋯ ,rm(x))T.r(x) = \big(r_1(

waitingwinter

18618人浏览 · 2020-05-15 15:27:51

waitingwinter · 2020-05-15 15:27:51 发布

LM算法简介

考虑如下非线性最小二乘问题 $\min \quad f(x) = \frac{1}{2}\Vert r(x)\Vert^2,$
其中残差向量 $r:\mathbb{R}^n \to \mathbb{R}$ 为
$\big(r_1(x),r_2(x),\cdots,r_m(x)\big)^T.$
$r$ 的雅可比矩阵 $J\in \mathbb{R}^{m\times n}$ 为 $J(x)=\left[\frac{\partial r_j}{\partial x_i}\right]=\left[\begin{array}{c} \nabla r_1(x)^T\\ \nabla r_2(x)^T\\ \cdots\\ \nabla r_m(x)^T \end{array}\right]$
简单计算有
$\nabla f(x) = \textcolor{blue}{J(x)^Tr(x)},\\ \nabla ^2 f(x) = \textcolor{blue}{J^T(x)J(x)} + \sum_{j=1}^{m}r_j(x)\nabla^2 r_j(x).$
在实际计算中, 我们丢掉 $\nabla ^2f(x)$ 中的第二项, 也就是说
$\nabla ^2 f(x) \approx J^T(x)J(x).$
利用标准的 Newton 法, 搜索方向 $p$ 满足 $\nabla^2f(x_k)p = -\nabla f(x_k).$ 即求解 $J_k^TJ_kp=-J_k^Tr_k.$
上述问题等价于求解子问题
$\min \limits_{p}\quad \frac{1}{2}\Vert J_kp+r_k\Vert^2.$
我们采用信赖域框架来解决此问题, 即
$\min \limits_{p} \quad \frac{1}{2}\Vert J_kp+r_k\Vert ^2,\quad s.t. \; \Vert p\Vert \leq \Delta_k,$
简单证明有上式等价于:
$(J^TJ+\lambda I)p^{LM} = - J^Tr,\\ \lambda (\Delta -\Vert p^{LM})=0.$
求解此子问题的具体算法为
在这里插入图片描述
需要注意的是, 实际计算中一般取2$\sim$3 次迭代即可, 不需要迭代至收敛.
**(也可以不使用Cholesky分解, 使用SVD,QR等方法)
**
信赖域的整体算法为

注意：其中1.3换成之前给出的算法.

LM算法在MATLAB中的实现

问题描述

Write a code to use the Levenberg-Marquadrdt algorithm to fit an exponential function of the form $f(x;\theta)=\theta_1e^{\theta_2x}$ to the data
$0,1,2,3,4,5,\\ 5.2, 4.5, 2.7, 2.5, 2.1, 1.9.$
(The first list gives $x^{(i)}$ ; the second list gives $y^{(i)}.$ ) Plot your model $\hat{f}(x;\theta)$ versus $x$ , along with the data points.

问题分析

记 $(0,1,2,3,4,5)^T\in \mathbb{R}^6, y = (5.2, 4.5, 2.7, 2.5, 2.1, 1.9)^T\in\mathbb{R}^6,f(x;\theta) = \theta_1e^{\theta_2 x},$ 优化问题为
$\min\limits_{\theta_1,\theta_2}\quad \frac{1}{2}\Vert f(x;\theta) - y\Vert^2,$
即 $r(\theta)=f(x;\theta)-y, \quad r_i(\theta) = f(x_i;\theta) - y_i, \forall\; i.$
计算 $r$ 的 Jacobi 矩阵 $J\in \mathbb{R}^{6\times 2},$ 有
$J(\theta)=\left[\frac{\partial r_j}{\partial \theta_i}\right]=\left[ \nabla r_1(\theta)^T\\ \nabla r_2(\theta)^T\\ \cdots\\ \nabla r_6(\theta)^T \right] = \left[\begin{array}{cc} e^{\theta_2 x_1}&\theta_1 x_1 e^{\theta_2 x_1}\\ e^{\theta_2 x_2}&\theta_1 x_2 e^{\theta_2 x_2}\\ \cdots&\cdots\\ e^{\theta_2 x_6}&\theta_1 x_6 e^{\theta_2 x_6} \end{array}\right]$

问题求解

我们采用 Trust Region Method 来求解此问题我们取 $\theta_{init} = (1,1)^T, \Delta_{init} = 1, \lambda^{(0)}=0.01,$ Algorithm1 的迭代次数为 3, 总的迭代次数为 100, $\textcolor{red}{\theta_1 = 5.2109, \quad \theta_2 =-0.2341.}$
拟合的函数图像如图所示:
在这里插入图片描述

求解子问题的程序为

function p = Subproblem(J, r, lambda, delta)
    % x is the solution of subproblem, lambda is the value after updating
    for j = 1:3
        [pl,R] = cholesky(J'*J+lambda*eye(2), -J'*r);
        ql = huidaifa(R', pl);
        diff = (norm(pl)/norm(ql))^2 * (norm(pl)-delta) / delta;
        lambda = lambda + diff;
    end
    [p, ~] = cholesky(J'*J+lambda*eye(2), -J'*r);
end

Cholesky分解程序为

function [x,L]=cholesky(A,b)
%用平方根法解对称正定矩阵Ax=b
%A=LL'
%Ly=b;
%L'x=y
[m,n]=size(A);
if m~=n
    fprintf('Matrix is not a square matrix');
    return;
end

for k=1:n
    A(k,k)=sqrt(A(k,k));
    A(k+1:n,k)=A(k+1:n,k)/A(k,k);
    for j=k+1:n
        A(j:n,j)=A(j:n,j)-A(j:n,k)*A(j,k);
    end
end
L = tril(A);
%step2
for j=1:n-1
    b(j)=b(j)/A(j,j);
    b(j+1:n)=b(j+1:n)-b(j)*A(j+1:n,j);
end
b(n)=b(n)/A(n,n);

%step3
for j=n:-1:2
    b(j)=b(j)/A(j,j);
    b(1:j-1)=b(1:j-1)-b(j)*A(j,1:j-1)';
end
b(1)=b(1)/A(1,1);
x=b;
% plot(x)
end

解上三角矩阵程序为

function x=huidaifa(U,y)
%% 解上三角矩阵Ux=y
[~,n]=size(U);
for j=n:-1:2
    y(j,1)=y(j,1)/U(j,j);
    y(1:j-1,1)=y(1:j-1,1)-y(j,1)*U(1:j-1,j);
end
y(1,1)=y(1,1)/U(1,1);
x=y;
end

主程序为

function theta = LM(x)
%% Levenberg-Marquadrdt ALgorithm to solve Least-Square problem
% x: input vector 
% x = [0,    1,   2,   3,   4,   5;
%      5.2, 4.5, 2.7, 2.5, 2.1, 1.9]
% f(x) = theta_1 exp(theta_2 x[1,:]), where theta_1, theta2 is parameters;
% We try to use LM method to learn the parameters theta_1, theta_2;

if nargin < 1 
    x = [0, 1, 2, 3, 4, 5; 5.2, 4.5, 2.7, 2.5, 2.1, 1.9];
    theta_init = [1;1]; % initial value
    lambda_init = 0.01; % initialize the parameter lambda of subproblem
    delta_init = 1; % the radius of trust region
    delta_max = 4;
end

theta = theta_init;
delta = delta_init;
maxiter = 100;
eta1 = 0.25;
eta2 = 0.75;
eta = 0;
x_input = x(1, :)';
x_output = x(2, :)';
% Calculate jacobi matrix
J = [exp(theta(2)*x_input), theta(1) * x_input .* exp(theta(2)*x_input)];
x_pred = theta(1) * exp(theta(2)*x_input);
r = x_pred - x_output;
i = 0;
eps = 1e-10;
p = [1; 1];
theta_best = theta_init;
loss_best = inf;
while i < maxiter
    if norm(p) < eps
        disp(i);
        break;
    end
    p = Subproblem(J, r, lambda_init, delta);
    % evalute rho
    theta_tmp = theta + p;
    x_pred_tmp = theta_tmp(1) * exp(theta_tmp(2)*x_input);
    f_pred = sum((x_pred_tmp - x_output).^2);
    x_pred_tmp2 = theta(1) * exp(theta(2)*x_input);
    f_pred2 = sum((x_pred_tmp2 - x_output).^2);
    rho = (f_pred-f_pred2) / (p'*J'*r + 0.5*p'*(J'*J)*p);
    % update delta 
    if rho < eta1
        delta = eta1*delta;
    elseif rho>eta2 && abs(norm(p)-delta)<1e-8
        delta = min(2*delta,delta_max);
    else
        % do nothing;
    end
    if rho > eta
        theta = theta_tmp;
        J = [exp(theta(2)*x_input), theta(1) * x_input .* exp(theta(2)*x_input)];
        x_pred = x_pred_tmp;
        r = x_pred - x_output;
    else
        % do nothing;
    end
    if f_pred < loss_best
        theta_best = theta_tmp;
        loss_best = f_pred;
    end
    i = i + 1;
end
theta = theta_best;
x_pred = theta(1)*exp(theta(2)*x_input);
plot(x_input, x_pred, 'r*', x_input, x_output, 'b*');
hold on;
xx = 0:0.01:5;
yy = theta(1) * exp(theta(2)*xx);
plot(xx,yy,'r--');
xlabel('x')
ylabel('y')
xlim([-0.2,5.2]);
ylim([-0.2,6.2]);
legend('pred', 'true');
title('Result');
axis on;
set(gca,'xtick',0:1:5); 
set(gca,'xticklabel',{0,1,2,3,4,5});
set(gca,'ytick',0:1:6); 
set(gca,'yticklabel',{0,1,2,3,4,5,6});
fprintf('loss:%.3f', loss_best);
end