一、递推最小二乘

1.什么是递推最小二乘

递推最小二乘算法(RLS) 是一种用于在线计算线性回归的方法。该算法可以在不需要保存所有数据的情况下,使用最小二乘法递推地计算线性回归系数。

 具体地说,该算法在每次接收一个新的样本时,会根据已经处理过的样本和相应的预测值,递推地更新线性回归系数。这样,就可以利用新的样本来更新模型,而不需要重新计算所有样本。

 递推最小二乘算法的优点是可以在不需要保存全部数据的情况下,快速计算出线性回归系数。因此,它在处理大量数据时很有用。

2.最小二乘和递推最小二乘的区别

 最小二乘是一种常见的回归分析方法,通过寻找最小化残差平方和的参数估计值来拟合数据。而递推最小二乘是对最小二乘方法的一种改进,它使用过去的数据估计未来的值,通过递推公式来计算给定时间点的预测值,同时不断更新参数估计值,以最小化当前误差的平方和。

 两者区别在于,最小二乘是一次性拟合整个模型,属于离线辨识方法。而递推最小二乘则是通过递推过程逐步建立模型,并不断根据新的数据进行参数更新和模型修正,属于在线辨识方法。递推最小二乘方法更适用于时间序列数据的预测,能够更好地反映数据的动态变化和趋势。

3.递推最小二乘算法的推导

 根据最小二乘算法可得参数向量 θ \pmb \theta θ的最小估计值:
θ ^ ( t ) = ( H t T H t ) − 1 H t T Y t (1-1) \hat{\pmb{\theta}}(t)=(\pmb{H_t}^T\pmb{H_t})^{-1}\pmb{H_t}^T\pmb{Y_t}\tag{1-1} θ^(t)=(HtTHt)1HtTYt(1-1)上式的推导可以看【系统辨识】最小二乘估计请务必明白式(1-1)的由来和含义,有利于明白后文整个过程的推导。下面讨论将公式(1-1)改写为递推计算式,即参数估计 θ ^ ( t ) \hat{\pmb{\theta}}(t) θ^(t)等于前一时刻的参数估计 θ ^ ( t − 1 ) \hat{\pmb{\theta}}(t-1) θ^(t1)加上一校正项,构成一个递推计算式。
 首先定义非降矩阵
P − 1 ( t ) = H t T H t = ∑ j = 1 t φ ( j ) φ T ( j ) = P − 1 ( t − 1 ) + φ ( t ) φ T ( t ) . (1-2) \begin{aligned}\pmb{P}^{-1}(t)&=\pmb{H}^{T}_t\pmb{H}_t=\sum_{j=1}^{t}\pmb{\varphi(j)}\pmb{\varphi^T(j)}\\&=\pmb{P}^{-1}(t-1)+\pmb{\varphi(t)}\pmb{\varphi^T(t)}.\tag{1-2}\end{aligned} P1(t)=HtTHt=j=1tφ(j)φT(j)=P1(t1)+φ(t)φT(t).(1-2)由此递推计算式可得
P − 1 ( t ) = P − 1 ( t − 1 ) + φ ( t ) φ T ( t ) = P − 1 ( t − 2 ) + ∑ j = t − 1 t φ ( j ) φ T ( j ) = . . . . . . . . . . . = P − 1 ( 0 ) + ∑ j = 1 t φ ( j ) φ T ( j ) = P − 1 ( 0 ) + H t T H t . \begin{aligned}\pmb{P}^{-1}(t)&=\pmb{P}^{-1}(t-1)+\pmb{\varphi(t)}\pmb{\varphi^T(t)} \\&=\pmb P^{-1}(t-2)+\sum_{j=t-1}^{t}\pmb{\varphi(j)}\pmb{\varphi^T(j)} \\&=........... \\&=\pmb P^{-1}(0)+\sum_{j=1}^{t}\pmb{\varphi(j)}\pmb{\varphi^T(j)} \\&=\pmb P^{-1}(0)+\pmb{H}^{T}_t\pmb{H}_t. \end{aligned} P1(t)=P1(t1)+φ(t)φT(t)=P1(t2)+j=t1tφ(j)φT(j)=...........=P1(0)+j=1tφ(j)φT(j)=P1(0)+HtTHt.将上式与式(1-2)比较可以得到, P − 1 ( t ) \pmb P^{-1}(t) P1(t)初值应该取 P − 1 ( 0 ) = 0 \pmb P^{-1}(0)=0 P1(0)=0,但实际中取为一个非常小的正定矩阵,即 P − 1 ( 0 ) = I / p 0 \pmb P^{-1}(0)=\pmb I/p_0 P1(0)=I/p0 P ( 0 ) = p 0 I \pmb P(0)=p_0\pmb I P(0)=p0I,这里 p 0 p_0 p0为一个大的常数,如 p 0 = 1 0 6 . p_0=10^6. p0=106.
 另外,根据 Y t \pmb{Y}_t Yt H t \pmb{H}_t Ht的定义可知(在【系统辨识】最小二乘估计中有相关定义),
Y t = [ y ( 1 ) y ( 2 ) . . . y ( t − 1 ) y ( t ) ] = [ Y t − 1 y ( t ) ] ∈ R t , H t = [ φ T ( 1 ) φ T ( 2 ) . . . φ T ( t − 1 ) φ T ( t ) ] = [ H t − 1 φ T ( t ) ] ∈ R t × n . \pmb{Y}_t=\begin{bmatrix} y(1)\\y(2)\\...\\y(t-1)\\y(t)\end{bmatrix}=\begin{bmatrix} \pmb{Y}_{t-1}\\y(t)\end{bmatrix}\in R^t,\\ \pmb{H_t}=\begin{bmatrix} \pmb{\varphi^T(1)}\\ \pmb{\varphi^T(2)}\\...\\\pmb{\varphi^T(t-1)}\\\pmb{\varphi^T(t)}\end{bmatrix}=\begin{bmatrix}\pmb{H_{t-1}}\\\pmb{\varphi^T(t)}\end{bmatrix}\in R^{t\times n}. Yt= y(1)y(2)...y(t1)y(t) =[Yt1y(t)]Rt,Ht= φT(1)φT(2)...φT(t1)φT(t) =[Ht1φT(t)]Rt×n.把式(1-2)带入式(1-1)中可得
θ ^ ( t ) = ( H t T H t ) − 1 H t T Y t = P ( t ) H t T Y t = P ( t ) [ H t − 1 T φ ( t ) ] [ Y t − 1 y ( t ) ] = P ( t ) [ H t − 1 T Y t − 1 + φ ( t ) y ( t ) ] = P ( t ) [ P − 1 ( t − 1 ) P ( t − 1 ) H t − 1 T Y t − 1 + φ ( t ) y ( t ) ] = P ( t ) [ P − 1 ( t − 1 ) θ ^ ( t − 1 ) + φ ( t ) y ( t ) ] = P ( t ) [ [ P − 1 ( t ) − φ ( t ) φ T ( t ) ] θ ^ ( t − 1 ) + φ ( t ) y ( t ) ] = θ ^ ( t − 1 ) − P ( t ) φ ( t ) φ T ( t ) θ ^ ( t − 1 ) + P ( t ) φ ( t ) y ( t ) = θ ^ ( t − 1 ) + P ( t ) φ ( t ) [ y ( t ) − φ T ( t ) θ ^ ( t − 1 ) ] \begin{aligned}\hat{\pmb{\theta}}(t)&=(\pmb{H^T_t}\pmb{H_t})^{-1}\pmb{H^T_t}\pmb{Y_t}=\pmb{P}(t)\pmb{H^T_t}\pmb{Y_t}\\&=\pmb{P}(t)\begin{bmatrix}\pmb{H^T_{t-1}}&\pmb{\varphi}(t)\end{bmatrix}\begin{bmatrix} \pmb{Y}_{t-1}\\y(t)\end{bmatrix} \\&=\pmb P(t)[\pmb{H^T_{t-1}}\pmb{Y}_{t-1}+\pmb{\varphi}(t)y(t)] \\&=\pmb P(t)[\pmb P^{-1}(t-1)\pmb P(t-1)\pmb{H^T_{t-1}}\pmb{Y}_{t-1}+\pmb{\varphi}(t)y(t)] \\&=\pmb P(t)[\pmb P^{-1}(t-1)\pmb{\hat{\theta}}(t-1)+\pmb{\varphi}(t)y(t)] \\&=\pmb P(t)[[\pmb{P}^{-1}(t)-\pmb{\varphi(t)}\pmb{\varphi^T(t)}]\pmb{\hat{\theta}}(t-1)+\pmb{\varphi}(t)y(t)] \\&=\pmb{\hat{\theta}}(t-1)-\pmb{P}(t)\pmb{\varphi(t)}\pmb{\varphi^T(t)}\pmb{\hat{\theta}}(t-1)+\pmb P(t)\pmb{\varphi}(t)y(t) \\&=\pmb{\hat{\theta}}(t-1)+\pmb{P}(t)\pmb{\varphi(t)}[y(t)-\pmb{\varphi^T(t)}\pmb{\hat{\theta}}(t-1)] \end{aligned} θ^(t)=(HtTHt)1HtTYt=P(t)HtTYt=P(t)[Ht1Tφ(t)][Yt1y(t)]=P(t)[Ht1TYt1+φ(t)y(t)]=P(t)[P1(t1)P(t1)Ht1TYt1+φ(t)y(t)]=P(t)[P1(t1)θ^(t1)+φ(t)y(t)]=P(t)[[P1(t)φ(t)φT(t)]θ^(t1)+φ(t)y(t)]=θ^(t1)P(t)φ(t)φT(t)θ^(t1)+P(t)φ(t)y(t)=θ^(t1)+P(t)φ(t)[y(t)φT(t)θ^(t1)]联立上式和式(1-2)可以得到参数向量 θ \pmb \theta θ递推最小二乘算法(RLS): θ ^ ( t ) = θ ^ ( t − 1 ) + P ( t ) φ ( t ) [ y ( t ) − φ T ( t ) θ ^ ( t − 1 ) ] , (1-3) \hat{\pmb{\theta}}(t)=\pmb{\hat{\theta}}(t-1)+\pmb{P}(t)\pmb{\varphi(t)}[y(t)-\pmb{\varphi^T(t)}\pmb{\hat{\theta}}(t-1)],\tag{1-3} θ^(t)=θ^(t1)+P(t)φ(t)[y(t)φT(t)θ^(t1)],(1-3) P − 1 ( t ) = P − 1 ( t − 1 ) + φ ( t ) φ T ( t ) , P ( 0 ) = p 0 I > 0. (1-4) \pmb{P}^{-1}(t)=\pmb{P}^{-1}(t-1)+\pmb{\varphi(t)}\pmb{\varphi^T(t)},\pmb P(0)=p_0\pmb I>0.\tag{1-4} P1(t)=P1(t1)+φ(t)φT(t),P(0)=p0I>0.(1-4)

4.递推最小二乘的等价形式

 为了避免协方差矩阵 P ( t ) \pmb P(t) P(t)的求逆运算,通常运用矩阵求逆公式和引入一个增益向量 L ( t ) = P ( t ) φ ( t ) \pmb L(t)=\pmb P(t)\pmb \varphi(t) L(t)=P(t)φ(t),将式(1-3)和(1-4)化为一种等价形式。
 首先引入一个矩阵求逆引理:设 A ∈ R n × n , B ∈ R n × r , C ∈ R r × n \pmb A \in R^{n \times n},\pmb B\in R^{n \times r},\pmb C \in R^{r \times n} ARn×n,BRn×r,CRr×n,假设矩阵 A \pmb A A ( I + C A − 1 B ) (\pmb I+\pmb{CA^{-1}B}) (I+CA1B)可逆,则下列等式成立, ( A + B C ) − 1 = A − 1 − A − 1 B ( I + C A − 1 B ) − 1 C A − 1 . (1-5) (\pmb A+\pmb {BC})^{-1}=\pmb A^{-1}-\pmb A^{-1}\pmb{B}(\pmb I+\pmb {CA^{-1}B})^{-1}\pmb{CA}^{-1}.\tag{1-5} (A+BC)1=A1A1B(I+CA1B)1CA1.(1-5)通过式(1-5)应用到式(1-2)中可得 P ( t ) = P ( t − 1 ) − P ( t − 1 ) φ ( t ) φ T ( t ) P ( t − 1 ) 1 + φ T ( t ) P ( t − 1 ) φ ( t ) (1-6) \pmb P(t)=\pmb P(t-1)-\frac{\pmb P(t-1)\pmb{\varphi(t)}\pmb{\varphi^T(t)}\pmb P(t-1)}{1+\pmb{\varphi^T(t)}\pmb P(t-1)\pmb{\varphi(t)}}\tag{1-6} P(t)=P(t1)1+φT(t)P(t1)φ(t)P(t1)φ(t)φT(t)P(t1)(1-6)将上式两边右乘向量 φ ( t ) \pmb \varphi(t) φ(t)可得 P ( t ) φ ( t ) = P ( t − 1 ) φ ( t ) − P ( t − 1 ) φ ( t ) φ T ( t ) P ( t − 1 ) φ ( t ) 1 + φ T ( t ) P ( t − 1 ) φ ( t ) = P ( t − 1 ) φ ( t ) [ 1 − φ T ( t ) P ( t − 1 ) φ ( t ) 1 + φ T ( t ) P ( t − 1 ) φ ( t ) ] = P ( t − 1 ) φ ( t ) 1 + φ T ( t ) P ( t − 1 ) φ ( t ) = L ( t ) . \begin{aligned}\pmb P(t)\pmb \varphi(t)&=\pmb P(t-1)\pmb \varphi(t)-\frac{\pmb P(t-1)\pmb{\varphi(t)}\pmb{\varphi^T(t)}\pmb P(t-1)\pmb{\varphi(t)}}{1+\pmb{\varphi^T(t)}\pmb P(t-1)\pmb{\varphi(t)}} \\&=\pmb P(t-1)\pmb \varphi(t)[1-\frac{\pmb{\varphi^T(t)}\pmb P(t-1)\pmb{\varphi(t)}}{1+\pmb{\varphi^T(t)}\pmb P(t-1)\pmb{\varphi(t)}}] \\&=\frac{\pmb P(t-1)\pmb{\varphi(t)}}{1+\pmb{\varphi^T(t)}\pmb P(t-1)\pmb{\varphi(t)}}=\pmb L(t). \end{aligned} P(t)φ(t)=P(t1)φ(t)1+φT(t)P(t1)φ(t)P(t1)φ(t)φT(t)P(t1)φ(t)=P(t1)φ(t)[11+φT(t)P(t1)φ(t)φT(t)P(t1)φ(t)]=1+φT(t)P(t1)φ(t)P(t1)φ(t)=L(t). L ( t ) \pmb L(t) L(t)代入式(1-6)可得 P ( t ) = P ( t − 1 ) − P ( t − 1 ) φ ( t ) φ T ( t ) P ( t − 1 ) 1 + φ T ( t ) P ( t − 1 ) φ ( t ) = P ( t − 1 ) − L ( t ) φ T ( t ) P ( t − 1 ) = [ I − L ( t ) φ T ( t ) ] P ( t − 1 ) , P ( 0 ) = p 0 I . \begin{aligned}\pmb P(t)&=\pmb P(t-1)-\frac{\pmb P(t-1)\pmb{\varphi(t)}\pmb{\varphi^T(t)}\pmb P(t-1)}{1+\pmb{\varphi^T(t)}\pmb P(t-1)\pmb{\varphi(t)}} \\&=\pmb P(t-1)-\pmb L(t)\pmb{\varphi^T(t)}\pmb P(t-1) \\&=[\pmb I-\pmb L(t)\pmb{\varphi^T(t)}]\pmb P(t-1),\pmb P(0)=p_0\pmb I.\end{aligned} P(t)=P(t1)1+φT(t)P(t1)φ(t)P(t1)φ(t)φT(t)P(t1)=P(t1)L(t)φT(t)P(t1)=[IL(t)φT(t)]P(t1),P(0)=p0I.
综上,递推最小二乘(RLS)算法还可以表达为
θ ^ ( t ) = θ ^ ( t − 1 ) + L ( t ) [ y ( t ) − φ T ( t ) θ ^ ( t − 1 ) ] , (1-7) \hat{\pmb{\theta}}(t)=\pmb{\hat{\theta}}(t-1)+\pmb L(t)[y(t)-\pmb{\varphi^T(t)}\pmb{\hat{\theta}}(t-1)],\tag{1-7} θ^(t)=θ^(t1)+L(t)[y(t)φT(t)θ^(t1)],(1-7) L ( t ) = P ( t − 1 ) φ ( t ) [ 1 + φ T ( t ) P ( t − 1 ) φ ( t ) ] − 1 , (1-8) \pmb L(t)=\pmb P(t-1)\pmb{\varphi(t)}[1+\pmb{\varphi^T(t)}\pmb P(t-1)\pmb{\varphi(t)}]^{-1},\tag{1-8} L(t)=P(t1)φ(t)[1+φT(t)P(t1)φ(t)]1,(1-8) P ( t ) = [ I − L ( t ) φ T ( t ) ] P ( t − 1 ) , P ( 0 ) = p 0 I . (1-9) \pmb P(t)=[\pmb I-\pmb L(t)\pmb{\varphi^T(t)}]\pmb P(t-1),\pmb P(0)=p_0\pmb I.\tag{1-9} P(t)=[IL(t)φT(t)]P(t1),P(0)=p0I.(1-9)

注:在递推算法中,初值通常选择
P ( 0 ) = \pmb P(0)= P(0)=很大对称正定矩阵,如 P ( 0 ) = p 0 I , p 0 = 1 0 6 ≫ 1 , P(0)=p_0\pmb I,p_0=10^6\gg1, P(0)=p0I,p0=1061,
θ ^ ( t ) = \hat{\pmb{\theta}}(t)= θ^(t)=很小实向量,如 θ ^ ( t ) = 1 n / p 0 . \hat{\pmb{\theta}}(t)=1_n/p_0. θ^(t)=1n/p0.

二、MATLAB仿真

考虑CAR模型仿真对象,模型如下图所示。
A ( z ) y ( t ) = B ( z ) u ( t ) + v ( t ) , A ( z ) = 1 + a 1 z − 1 + a 2 z − 2 + a 3 z − 3 = 1 − 1.40 z − 1 + 0.50 z − 2 + 0.10 z − 3 , B ( z ) = b 1 z − 1 + b 2 z − 2 + b 3 z − 3 = 0.50 z − 1 − 0.60 z − 2 − 0.70 z − 3 , θ = [ a 1 , a 2 , a 3 , b 1 , b 2 , b 3 ] T = [ − 1.40 , 0.50 , 0.10 , 0.50 , − 0.60 , − 0.70 ] T . \begin{aligned}A(z)y(t)&=B(z)u(t)+v(t), \\A(z)&=1+a_1z^{-1}+a_2z^{-2}+a_3z^{-3}=1-1.40z^{-1}+0.50z^{-2}+0.10z^{-3}, \\B(z)&=b_1z^{-1}+b_2z^{-2}+b_3z^{-3}=0.50z^{-1}-0.60z^{-2}-0.70z^{-3}, \\\pmb \theta&=[a_1,a_2,a_3,b_1,b_2,b_3]^T=[-1.40,0.50,0.10,0.50,-0.60,-0.70]^T.\end{aligned} A(z)y(t)A(z)B(z)θ=B(z)u(t)+v(t),=1+a1z1+a2z2+a3z3=11.40z1+0.50z2+0.10z3,=b1z1+b2z2+b3z3=0.50z10.60z20.70z3,=[a1,a2,a3,b1,b2,b3]T=[1.40,0.50,0.10,0.50,0.60,0.70]T.

模型示意图

参考程序:

%注:第一次运行sigma=0.1,程序将估计到的参数保存到data1的文件中。再令sigma=1,程序将会比较sigma=0.1和sigma=1时参数估计的误差。

clear all;
%初始化模型
FF =1;% The Forgettingfactor 
sigma =0.1;% sigma =0.10 and 1.0  噪声方差
PlotLength =3000; length1 = PlotLength +100;
na =3; nb =3; n = na + nb ;
a =[1,-1.4,0.5,0.1]; b =[0,0.5,-0.6,-0.7]; d =[1 0 0 0];
par0=[a(2:na+1), b(2:nb+1)]';%参数theta真实值
p0=1e6; P=eye(n)*p0; r=1;%P(0)=p0*I,很大对称正定矩阵
par1 = ones(n,1)*1e-6;%参数估计初值theta=1/p0,很小实向量

%生成输入输出数据
rand ('state',40);% randn('state',1);
eta = randn(length1); %得到服从标准正态分布的随机数
u = eta(:,1); v = eta(:,2)*sigma ; %得到输入序列u和方差为sigma的干扰v
clear eta ;
Gz = tf(b ,a ,1), Gn=tf(d ,a,1) %建立传递Gz,Gn函数
y = lsim(Gz,u)+ lsim(Gn ,v);   %根据输入u和干扰v得到系统输出y

% 递推最小二乘(RLS)算法
jj =0;j1=0;
for t =20:length1
    jj = jj +1;
    varphi =[-y(t-1:-1:t-na); u(t-1:-1:t-nb )];
    L = P * varphi /( FF + varphi'* P * varphi);
    P = P - L * varphi'* P ;
    par1=par1+ L *( y ( t )- varphi'*par1);
    delta1 = norm(par1 -par0)/norm(par0);%与真值的误差
    ls(jj,:)=[jj,par1',delta1 ];%存储每一step的参数估计和误差
    if (jj==100)|(jj==200)|(jj==500)|mod(jj,1000)==0
        j1=j1+1;
        ls_100(j1,:)=[jj,par1',delta1*100];%存储t=100,200,500,1000,2000,3000时刻的参数估计和误差
    end
    if jj == PlotLength %t=3000时停止计算
        break 
    end 
end 

%作图程序
ls_100(j1+1,:)=[0,par0',0];
figure (1)
jk =(17:10:PlotLength-1)';
plot(ls(jk ,1),ls(jk,n+2));
if sigma ==0.1
    data1 =[ls(:,1),ls(:,n +2)];
    save data1 data1 
else % sigma ==1.0
    load data1 
    z0=[ data1 , ls(:, n +2)];
    figure (3);
    plot (z0(jk ,1),z0(jk ,2),' k ',z0(jk,1),z0( jk ,3),' b ')
    axis ([0,3000,0,0.33])
     text (600,0.058,'{\it\sigma^2}=1.00*2')
     text (600,0.13,'{\it\sigma^2}=0.10*2')
    line ([247,620],[0.024,0.119])
 end

运行结果:

下表由数组ls_100得出
参数的RLS估计及误差

运行结果


总结

(1)递推最小二次算法有很强的鲁棒性
(2)随着数据长度的增加,参数估计误差不断减小
(3)输入输出信号的幅值越大,参数估计精度越高


参考文献

丁锋.系统辨识新论[M].北京:科学出版社,2013:120.

Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐