LM算法简介

考虑如下非线性最小二乘问题 min ⁡ f ( x ) = 1 2 ∥ r ( x ) ∥ 2 , \min \quad f(x) = \frac{1}{2}\Vert r(x)\Vert^2, minf(x)=21r(x)2,
其中残差向量 r : R n → R r:\mathbb{R}^n \to \mathbb{R} r:RnR
r ( x ) = ( r 1 ( x ) , r 2 ( x ) , ⋯   , r m ( x ) ) T . r(x) = \big(r_1(x),r_2(x),\cdots,r_m(x)\big)^T. r(x)=(r1(x),r2(x),,rm(x))T.
r r r 的雅可比矩阵 J ∈ R m × n J\in \mathbb{R}^{m\times n} JRm×n J ( x ) = [ ∂ r j ∂ x i ] = [ ∇ r 1 ( x ) T ∇ r 2 ( x ) T ⋯ ∇ r m ( x ) T ] J(x)=\left[\frac{\partial r_j}{\partial x_i}\right]=\left[\begin{array}{c} \nabla r_1(x)^T\\ \nabla r_2(x)^T\\ \cdots\\ \nabla r_m(x)^T \end{array}\right] J(x)=[xirj]=r1(x)Tr2(x)Trm(x)T
简单计算有
∇ f ( x ) = J ( x ) T r ( x ) , ∇ 2 f ( x ) = J T ( x ) J ( x ) + ∑ j = 1 m r j ( x ) ∇ 2 r j ( x ) . \nabla f(x) = \textcolor{blue}{J(x)^Tr(x)},\\ \nabla ^2 f(x) = \textcolor{blue}{J^T(x)J(x)} + \sum_{j=1}^{m}r_j(x)\nabla^2 r_j(x). f(x)=J(x)Tr(x),2f(x)=JT(x)J(x)+j=1mrj(x)2rj(x).
在实际计算中, 我们丢掉 ∇ 2 f ( x ) \nabla ^2f(x) 2f(x) 中的第二项, 也就是说
∇ 2 f ( x ) ≈ J T ( x ) J ( x ) . \nabla ^2 f(x) \approx J^T(x)J(x). 2f(x)JT(x)J(x).
利用标准的 Newton 法, 搜索方向 p p p 满足 ∇ 2 f ( x k ) p = − ∇ f ( x k ) . \nabla^2f(x_k)p = -\nabla f(x_k). 2f(xk)p=f(xk). 即求解 J k T J k p = − J k T r k . J_k^TJ_kp=-J_k^Tr_k. JkTJkp=JkTrk.
上述问题等价于求解子问题
min ⁡ p 1 2 ∥ J k p + r k ∥ 2 . \min \limits_{p}\quad \frac{1}{2}\Vert J_kp+r_k\Vert^2. pmin21Jkp+rk2.
我们采用信赖域框架来解决此问题, 即
min ⁡ p 1 2 ∥ J k p + r k ∥ 2 , s . t .    ∥ p ∥ ≤ Δ k , \min \limits_{p} \quad \frac{1}{2}\Vert J_kp+r_k\Vert ^2,\quad s.t. \; \Vert p\Vert \leq \Delta_k, pmin21Jkp+rk2,s.t.pΔk,
简单证明有上式等价于:
( J T J + λ I ) p L M = − J T r , λ ( Δ − ∥ p L M ) = 0. (J^TJ+\lambda I)p^{LM} = - J^Tr,\\ \lambda (\Delta -\Vert p^{LM})=0. (JTJ+λI)pLM=JTr,λ(ΔpLM)=0.
求解此子问题的具体算法为
在这里插入图片描述
需要注意的是, 实际计算中一般取2$\sim$3 次迭代即可, 不需要迭代至收敛.
**(也可以不使用Cholesky分解, 使用SVD,QR等方法)
**
信赖域的整体算法为

注意:其中1.3换成之前给出的算法.

LM算法在MATLAB中的实现

问题描述

Write a code to use the Levenberg-Marquadrdt algorithm to fit an exponential function of the form f ( x ; θ ) = θ 1 e θ 2 x f(x;\theta)=\theta_1e^{\theta_2x} f(x;θ)=θ1eθ2x to the data
0 , 1 , 2 , 3 , 4 , 5 , 5.2 , 4.5 , 2.7 , 2.5 , 2.1 , 1.9. 0,1,2,3,4,5,\\ 5.2, 4.5, 2.7, 2.5, 2.1, 1.9. 0,1,2,3,4,5,5.2,4.5,2.7,2.5,2.1,1.9.
(The first list gives x ( i ) x^{(i)} x(i); the second list gives y ( i ) . y^{(i)}. y(i).) Plot your model f ^ ( x ; θ ) \hat{f}(x;\theta) f^(x;θ) versus x x x, along with the data points.

问题分析

x = ( 0 , 1 , 2 , 3 , 4 , 5 ) T ∈ R 6 , y = ( 5.2 , 4.5 , 2.7 , 2.5 , 2.1 , 1.9 ) T ∈ R 6 , f ( x ; θ ) = θ 1 e θ 2 x , x = (0,1,2,3,4,5)^T\in \mathbb{R}^6, y = (5.2, 4.5, 2.7, 2.5, 2.1, 1.9)^T\in\mathbb{R}^6,f(x;\theta) = \theta_1e^{\theta_2 x}, x=(0,1,2,3,4,5)TR6,y=(5.2,4.5,2.7,2.5,2.1,1.9)TR6,f(x;θ)=θ1eθ2x, 优化问题为
min ⁡ θ 1 , θ 2 1 2 ∥ f ( x ; θ ) − y ∥ 2 , \min\limits_{\theta_1,\theta_2}\quad \frac{1}{2}\Vert f(x;\theta) - y\Vert^2, θ1,θ2min21f(x;θ)y2,
r ( θ ) = f ( x ; θ ) − y , r i ( θ ) = f ( x i ; θ ) − y i , ∀    i . r(\theta)=f(x;\theta)-y, \quad r_i(\theta) = f(x_i;\theta) - y_i, \forall\; i. r(θ)=f(x;θ)y,ri(θ)=f(xi;θ)yi,i.
计算 r r r 的 Jacobi 矩阵 J ∈ R 6 × 2 , J\in \mathbb{R}^{6\times 2}, JR6×2,
J ( θ ) = [ ∂ r j ∂ θ i ] = [ ∇ r 1 ( θ ) T ∇ r 2 ( θ ) T ⋯ ∇ r 6 ( θ ) T ] = [ e θ 2 x 1 θ 1 x 1 e θ 2 x 1 e θ 2 x 2 θ 1 x 2 e θ 2 x 2 ⋯ ⋯ e θ 2 x 6 θ 1 x 6 e θ 2 x 6 ] J(\theta)=\left[\frac{\partial r_j}{\partial \theta_i}\right]=\left[ \nabla r_1(\theta)^T\\ \nabla r_2(\theta)^T\\ \cdots\\ \nabla r_6(\theta)^T \right] = \left[\begin{array}{cc} e^{\theta_2 x_1}&\theta_1 x_1 e^{\theta_2 x_1}\\ e^{\theta_2 x_2}&\theta_1 x_2 e^{\theta_2 x_2}\\ \cdots&\cdots\\ e^{\theta_2 x_6}&\theta_1 x_6 e^{\theta_2 x_6} \end{array}\right] J(θ)=[θirj]=[r1(θ)Tr2(θ)Tr6(θ)T]=eθ2x1eθ2x2eθ2x6θ1x1eθ2x1θ1x2eθ2x2θ1x6eθ2x6

问题求解

我们采用 Trust Region Method 来求解此问题我们取 θ i n i t = ( 1 , 1 ) T , Δ i n i t = 1 , λ ( 0 ) = 0.01 , \theta_{init} = (1,1)^T, \Delta_{init} = 1, \lambda^{(0)}=0.01, θinit=(1,1)T,Δinit=1,λ(0)=0.01, Algorithm1 的迭代次数为 3, 总的迭代次数为 100, θ 1 = 5.2109 , θ 2 = − 0.2341. \textcolor{red}{\theta_1 = 5.2109, \quad \theta_2 =-0.2341.} θ1=5.2109,θ2=0.2341.
拟合的函数图像如图所示:
在这里插入图片描述

求解子问题的程序为

function p = Subproblem(J, r, lambda, delta)
    % x is the solution of subproblem, lambda is the value after updating
    for j = 1:3
        [pl,R] = cholesky(J'*J+lambda*eye(2), -J'*r);
        ql = huidaifa(R', pl);
        diff = (norm(pl)/norm(ql))^2 * (norm(pl)-delta) / delta;
        lambda = lambda + diff;
    end
    [p, ~] = cholesky(J'*J+lambda*eye(2), -J'*r);
end

Cholesky分解程序为

function [x,L]=cholesky(A,b)
%用平方根法解对称正定矩阵Ax=b
%A=LL'
%Ly=b;
%L'x=y
[m,n]=size(A);
if m~=n
    fprintf('Matrix is not a square matrix');
    return;
end

for k=1:n
    A(k,k)=sqrt(A(k,k));
    A(k+1:n,k)=A(k+1:n,k)/A(k,k);
    for j=k+1:n
        A(j:n,j)=A(j:n,j)-A(j:n,k)*A(j,k);
    end
end
L = tril(A);
%step2
for j=1:n-1
    b(j)=b(j)/A(j,j);
    b(j+1:n)=b(j+1:n)-b(j)*A(j+1:n,j);
end
b(n)=b(n)/A(n,n);

%step3
for j=n:-1:2
    b(j)=b(j)/A(j,j);
    b(1:j-1)=b(1:j-1)-b(j)*A(j,1:j-1)';
end
b(1)=b(1)/A(1,1);
x=b;
% plot(x)
end

解上三角矩阵程序为

function x=huidaifa(U,y)
%% 解上三角矩阵Ux=y
[~,n]=size(U);
for j=n:-1:2
    y(j,1)=y(j,1)/U(j,j);
    y(1:j-1,1)=y(1:j-1,1)-y(j,1)*U(1:j-1,j);
end
y(1,1)=y(1,1)/U(1,1);
x=y;
end

主程序为

function theta = LM(x)
%% Levenberg-Marquadrdt ALgorithm to solve Least-Square problem
% x: input vector 
% x = [0,    1,   2,   3,   4,   5;
%      5.2, 4.5, 2.7, 2.5, 2.1, 1.9]
% f(x) = theta_1 exp(theta_2 x[1,:]), where theta_1, theta2 is parameters;
% We try to use LM method to learn the parameters theta_1, theta_2;

if nargin < 1 
    x = [0, 1, 2, 3, 4, 5; 5.2, 4.5, 2.7, 2.5, 2.1, 1.9];
    theta_init = [1;1]; % initial value
    lambda_init = 0.01; % initialize the parameter lambda of subproblem
    delta_init = 1; % the radius of trust region
    delta_max = 4;
end

theta = theta_init;
delta = delta_init;
maxiter = 100;
eta1 = 0.25;
eta2 = 0.75;
eta = 0;
x_input = x(1, :)';
x_output = x(2, :)';
% Calculate jacobi matrix
J = [exp(theta(2)*x_input), theta(1) * x_input .* exp(theta(2)*x_input)];
x_pred = theta(1) * exp(theta(2)*x_input);
r = x_pred - x_output;
i = 0;
eps = 1e-10;
p = [1; 1];
theta_best = theta_init;
loss_best = inf;
while i < maxiter
    if norm(p) < eps
        disp(i);
        break;
    end
    p = Subproblem(J, r, lambda_init, delta);
    % evalute rho
    theta_tmp = theta + p;
    x_pred_tmp = theta_tmp(1) * exp(theta_tmp(2)*x_input);
    f_pred = sum((x_pred_tmp - x_output).^2);
    x_pred_tmp2 = theta(1) * exp(theta(2)*x_input);
    f_pred2 = sum((x_pred_tmp2 - x_output).^2);
    rho = (f_pred-f_pred2) / (p'*J'*r + 0.5*p'*(J'*J)*p);
    % update delta 
    if rho < eta1
        delta = eta1*delta;
    elseif rho>eta2 && abs(norm(p)-delta)<1e-8
        delta = min(2*delta,delta_max);
    else
        % do nothing;
    end
    if rho > eta
        theta = theta_tmp;
        J = [exp(theta(2)*x_input), theta(1) * x_input .* exp(theta(2)*x_input)];
        x_pred = x_pred_tmp;
        r = x_pred - x_output;
    else
        % do nothing;
    end
    if f_pred < loss_best
        theta_best = theta_tmp;
        loss_best = f_pred;
    end
    i = i + 1;
end
theta = theta_best;
x_pred = theta(1)*exp(theta(2)*x_input);
plot(x_input, x_pred, 'r*', x_input, x_output, 'b*');
hold on;
xx = 0:0.01:5;
yy = theta(1) * exp(theta(2)*xx);
plot(xx,yy,'r--');
xlabel('x')
ylabel('y')
xlim([-0.2,5.2]);
ylim([-0.2,6.2]);
legend('pred', 'true');
title('Result');
axis on;
set(gca,'xtick',0:1:5); 
set(gca,'xticklabel',{0,1,2,3,4,5});
set(gca,'ytick',0:1:6); 
set(gca,'yticklabel',{0,1,2,3,4,5,6});
fprintf('loss:%.3f', loss_best);
end
Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐