摘要: 首先,已知最小二乘法的解后,推导出解的递推形式;然后通过一个简单的公式y=x+w的形式引入遗忘因子,并改写成递推形式,最后将遗忘因子引入带递推的最小二乘法中。

递推最小二乘法

对多组数据 x ⃗ i \vec{x}_i x i y i y_i yi,满足
y i = x ⃗ i T θ ⃗ y_i = \vec{x}^\mathrm{T}_i\vec{\theta} yi=x iTθ
其中 x ⃗ i \vec{x}_i x i 是输入数据向量, y i y_i yi 是输出数据标量。写成矩阵形式
y ⃗ = X θ ⃗ \vec{y} = X\vec{\theta} y =Xθ
其中(以3组数据、2个未知参数为例)
y ⃗ = [ y 1 y 2 y 3 ] , X = [ x ⃗ 1 T x ⃗ 2 T x ⃗ 3 T ] , θ = [ θ 1 θ 2 ] , y i = x i 1 θ 1 + x i 2 θ 2 \vec{y} = \begin{bmatrix} y_1 \\ y_2 \\ y_3 \end{bmatrix}, X = \begin{bmatrix} \vec{x}^\mathrm{T}_1 \\ \vec{x}^\mathrm{T}_2 \\ \vec{x}^\mathrm{T}_3 \end{bmatrix}, \theta = \begin{bmatrix} \theta_1 \\ \theta_2 \end{bmatrix}, y_i=x_{i1}\theta_1+x_{i2}\theta_2 y = y1y2y3 ,X= x 1Tx 2Tx 3T ,θ=[θ1θ2],yi=xi1θ1+xi2θ2
完整地写作(以3组数据、2个未知参数为例)
[ y 1 y 2 y 3 ] = [ x 11 x 12 x 21 x 22 x 31 x 32 ] [ θ 1 θ 2 ] \begin{bmatrix} y_1 \\ y_2 \\ y_3 \end{bmatrix} =\begin{bmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \\ x_{31} & x_{32} \\ \end{bmatrix} \begin{bmatrix} \theta_1 \\ \theta_2 \end{bmatrix} y1y2y3 = x11x21x31x12x22x32 [θ1θ2]
k k k组数据时记作
X k T = [ x ⃗ 1 x ⃗ 2 ⋯ x ⃗ k ] y ⃗ k = [ y 1 y 2 ⋯ y k ] X^\mathrm{T}_k = \begin{bmatrix} \vec{x}_1 & \vec{x}_2 & \cdots & \vec{x}_k \end{bmatrix} \\ \vec{y}_k=\begin{bmatrix} y_1 & y_2 & \cdots & y_k\end{bmatrix} XkT=[x 1x 2x k]y k=[y1y2yk]
(此时注意区分一些符号,例如 y ⃗ k \vec{y}_k y k y k y_k yk X X X x ⃗ \vec{x} x 等)
已知最小二乘法的解为
θ ⃗ = ( X T X ) − 1 X T y ⃗ (1) \vec{\theta} =(X^\mathrm{T}X)^{-1} X^\mathrm{T}\vec{y} \tag{1} θ =(XTX)1XTy (1)
P = ( X T X ) − 1 P = (X^\mathrm{T}X)^{-1} P=(XTX)1
则加上递推后
P k − 1 = X k T X k = ∑ i = 1 k x ⃗ i x ⃗ i T = ∑ i = 1 k − 1 x ⃗ i x ⃗ i T + x ⃗ k x ⃗ k T = P k − 1 − 1 + x ⃗ k x ⃗ k T (2) \begin{aligned} P_k^{-1} =& X_k^\mathrm{T}X_k = \sum_{i=1}^k\vec{x}_i\vec{x}^\mathrm{T}_i\\ =& \sum_{i=1}^{k-1}\vec{x}_i\vec{x}^\mathrm{T}_i +\vec{x}_k\vec{x}^\mathrm{T}_k \\ =& P_{k-1}^{-1} + \vec{x}_k\vec{x}_k^\mathrm{T} \end{aligned} \tag{2} Pk1===XkTXk=i=1kx ix iTi=1k1x ix iT+x kx kTPk11+x kx kT(2)
同理可得
X k T y ⃗ k = X k − 1 T y ⃗ k − 1 + x ⃗ k y k (3) X_k^\mathrm{T}\vec{y}_k =X_{k-1}^\mathrm{T}\vec{y}_{k-1} +\vec{x}_ky_k \tag{3} XkTy k=Xk1Ty k1+x kyk(3)
于是代入式(1)得到
θ ⃗ k = P k X k T y ⃗ k = P k ( X k − 1 T y ⃗ k − 1 + x ⃗ k y k ) = P k ( P k − 1 − 1 θ ⃗ k − 1 + x ⃗ k y k ) = P k ( P k − 1 θ ⃗ k − 1 − x ⃗ k x ⃗ k T θ k − 1 + x ⃗ k y k ) = θ ⃗ k − 1 + P k x ⃗ k ( y k − x ⃗ k T θ k − 1 ) = θ ⃗ k − 1 + ( P k − 1 − 1 + x ⃗ k x ⃗ k T ) − 1 x ⃗ k ( y k − x ⃗ k T θ k − 1 ) = θ ⃗ k − 1 + ( P k − 1 − P k − 1 x ⃗ k x ⃗ k T P k − 1 1 + x ⃗ k T P k − 1 x ⃗ k ) x ⃗ k ( y k − x ⃗ k T θ k − 1 ) \begin{aligned} \vec{\theta}_k =& P_kX_k^\mathrm{T}\vec{y}_k \\ =& P_k(X_{k-1}^\mathrm{T}\vec{y}_{k-1} +\vec{x}_ky_k) \\ =& P_k(P_{k-1}^{-1}\vec{\theta}_{k-1} +\vec{x}_ky_k) \\ =& P_k(P_k^{-1}\vec{\theta}_{k-1} -\vec{x}_k\vec{x}_k^\mathrm{T}\theta_{k-1}+\vec{x}_ky_k) \\ =& \vec{\theta}_{k-1} + P_k\vec{x}_k (y_k-\vec{x}_k^\mathrm{T}\theta_{k-1}) \\ =& \vec{\theta}_{k-1} +(P_{k-1}^{-1} + \vec{x}_k\vec{x}_k^\mathrm{T})^{-1} \vec{x}_k(y_k-\vec{x}_k^\mathrm{T}\theta_{k-1}) \\ =& \vec{\theta}_{k-1} +(P_{k-1}-\frac{P_{k-1}\vec{x}_k\vec{x}_k^\mathrm{T} P_{k-1}}{1+\vec{x}_k^\mathrm{T}P_{k-1}\vec{x}_k}) \vec{x}_k(y_k-\vec{x}_k^\mathrm{T}\theta_{k-1}) \\ \end{aligned} θ k=======PkXkTy kPk(Xk1Ty k1+x kyk)Pk(Pk11θ k1+x kyk)Pk(Pk1θ k1x kx kTθk1+x kyk)θ k1+Pkx k(ykx kTθk1)θ k1+(Pk11+x kx kT)1x k(ykx kTθk1)θ k1+(Pk11+x kTPk1x kPk1x kx kTPk1)x k(ykx kTθk1)
其中倒数第3行到倒数第一行的过程
P k = P k − 1 − P k − 1 x ⃗ k x ⃗ k T P k − 1 1 + x ⃗ k T P k − 1 x ⃗ k P_k = P_{k-1}-\frac{P_{k-1}\vec{x}_k\vec{x}_k^\mathrm{T} P_{k-1}}{1+\vec{x}_k^\mathrm{T}P_{k-1}\vec{x}_k} Pk=Pk11+x kTPk1x kPk1x kx kTPk1
用到了下面的矩阵求逆公式及其引理
( A + B C D ) − 1 = A − 1 − A − 1 B ( D A − 1 B + C − 1 ) − 1 D A − 1 ( A + u ⃗ u ⃗ T ) − 1 = A − 1 − A − 1 u ⃗ u ⃗ T A − 1 1 + u ⃗ T A − 1 u ⃗ \begin{aligned} (A+BCD)^{-1} =& A^{-1}-A^{-1} B (DA^{-1}B+C^{-1})^{-1}DA^{-1} \\ (A+\vec{u}\vec{u}^\text{T})^{-1} =& A^{-1} -\frac{A^{-1}\vec{u}\vec{u}^\text{T} A^{-1}} {1+\vec{u}^\text{T}A^{-1}\vec{u}} \end{aligned} (A+BCD)1=(A+u u T)1=A1A1B(DA1B+C1)1DA1A11+u TA1u A1u u TA1

K k = P k − 1 x ⃗ k 1 + x ⃗ k T P k − 1 x ⃗ k K_k = \frac{P_{k-1}\vec{x}_k} {1+\vec{x}_k^\mathrm{T}P_{k-1}\vec{x}_k} Kk=1+x kTPk1x kPk1x k

P k = ( I − K k x ⃗ k T ) P k − 1 P k x ⃗ k = P k − 1 x ⃗ k − K k x ⃗ k T P k − 1 x ⃗ k = P k − 1 x ⃗ k ( 1 + x ⃗ k T P k − 1 x ⃗ k ) − P k − 1 x ⃗ k x ⃗ k T P k − 1 x ⃗ k 1 + x ⃗ k T P k − 1 x ⃗ k = P k − 1 x ⃗ k 1 + x ⃗ k T P k − 1 x ⃗ k = K k \begin{aligned} P_k =& (I-K_k\vec{x}_k^\mathrm{T})P_{k-1}\\ P_k\vec{x}_k =& P_{k-1}\vec{x}_k -K_k\vec{x}_k^\mathrm{T}P_{k-1}\vec{x}_k \\ =& \frac{P_{k-1}\vec{x}_k (1+\vec{x}_k^\mathrm{T}P_{k-1}\vec{x}_k) -P_{k-1}\vec{x}_k\vec{x}_k^\mathrm{T} P_{k-1}\vec{x}_k} {1+\vec{x}_k^\mathrm{T}P_{k-1}\vec{x}_k} \\ =& \frac{P_{k-1}\vec{x}_k} {1+\vec{x}_k^\mathrm{T}P_{k-1}\vec{x}_k} \\ =& K_k \\ \end{aligned} Pk=Pkx k====(IKkx kT)Pk1Pk1x kKkx kTPk1x k1+x kTPk1x kPk1x k(1+x kTPk1x k)Pk1x kx kTPk1x k1+x kTPk1x kPk1x kKk
总结成递推公式得到
K k = P k − 1 x ⃗ k 1 + x ⃗ k T P k − 1 x ⃗ k P k = ( I − K k x ⃗ k T ) P k − 1 θ ⃗ k = θ ⃗ k − 1 + K k ( y k − x ⃗ k T θ k − 1 ) \begin{aligned} K_k =& \frac{P_{k-1}\vec{x}_k}{1+\vec{x}_k^\mathrm{T} P_{k-1}\vec{x}_k}\\ P_k =& (I-K_k\vec{x}_k^\mathrm{T})P_{k-1} \\ \vec{\theta}_k =& \vec{\theta}_{k-1} +K_k(y_k-\vec{x}_k^\mathrm{T}\theta_{k-1}) \end{aligned} Kk=Pk=θ k=1+x kTPk1x kPk1x k(IKkx kT)Pk1θ k1+Kk(ykx kTθk1)

一些注意事项

  因为 P = ( X T X ) − 1 P=(X^\text{T}X)^{-1} P=(XTX)1 x ⃗ 1 x ⃗ 1 T \vec{x}_1\vec{x}_1^\text{T} x 1x 1T 一般很小,所以实际使用时一般把 P 1 P_1 P1 初始化为一个接近无穷大的单位阵。
  可以看出 P P P 是个对称矩阵,如果因为数值计算的原因导致 P P P 不严格对称时,代码里需要加一步: P ← ( P + P T ) / 2 P\leftarrow(P+P^\text{T})/2 P(P+PT)/2

引入遗忘因子

先考虑一个比较简单的情况
x n = θ + w n x_n=\theta+w_n xn=θ+wn
其中 θ \theta θ 是要估计的参数, w n w_n wn 是高斯白噪声, x n x_n xn 是观测到的数据,总共有 N N N 组数据,将 N N N 组数据取平均得到 θ \theta θ 的一个估计值为
θ ^ = 1 N ∑ n = 1 N x n \hat{\theta}=\frac{1}{N}\sum_{n=1}^Nx_n θ^=N1n=1Nxn
写成递推形式为
θ ^ n = θ ^ n − 1 + 1 n ( x n − θ ^ n − 1 ) \hat\theta_n=\hat\theta_{n-1}+\frac{1}{n}(x_n-\hat\theta_{n-1}) θ^n=θ^n1+n1(xnθ^n1)
例如,
θ ^ 1 = x 1 θ ^ 2 = θ ^ 1 + 1 2 ( x 2 − θ ^ 1 ) = 1 2 ( x 1 + x 2 ) θ ^ 3 = θ ^ 2 + 1 3 ( x 3 − θ ^ 2 ) = 1 3 ( x 1 + x 2 + x 3 ) \begin{aligned} \hat\theta_1 =& x_1\\ \hat\theta_2 =& \hat\theta_1+\frac{1}{2}(x_2-\hat\theta_1) =\frac{1}{2}(x_1+x_2) \\ \hat\theta_3 =& \hat\theta_2+\frac{1}{3}(x_3-\hat\theta_2) =\frac{1}{3}(x_1+x_2+x_3) \\ \end{aligned} θ^1=θ^2=θ^3=x1θ^1+21(x2θ^1)=21(x1+x2)θ^2+31(x3θ^2)=31(x1+x2+x3)
现在对之前的值乘一个衰减系数 λ ∈ ( 0 , 1 ) \lambda\in(0,1) λ(0,1),也就是
θ ^ 2 = 1 λ + 1 ( λ x 1 + x 2 ) θ ^ 3 = 1 λ 2 + λ + 1 ( λ 2 x 1 + λ x 2 + x 3 ) ⋮ \begin{aligned} \hat\theta_2 =& \frac{1}{\lambda+1}(\lambda x_1+x_2) \\ \hat\theta_3 =& \frac{1}{\lambda^2+\lambda+1}(\lambda^2x_1+\lambda x_2+x_3) \\ \vdots \end{aligned} θ^2=θ^3=λ+11(λx1+x2)λ2+λ+11(λ2x1+λx2+x3)
写成递推形式为
θ ^ 2 = 1 λ + 1 ( λ x 1 + x 2 ) θ ^ 3 = 1 λ 2 + λ + 1 ( λ ( λ + 1 ) θ ^ 2 + x 3 ) = θ ^ 2 + 1 λ 2 + λ + 1 ( x 3 − θ ^ 2 ) \begin{aligned} \hat\theta_2=&\frac{1}{\lambda+1}(\lambda x_1+x_2) \\ \hat\theta_3=&\frac{1}{\lambda^2+\lambda+1}(\lambda(\lambda+1)\hat\theta_2+x_3) \\ =&\hat\theta_2+\frac{1}{\lambda^2+\lambda+1}(x_3-\hat\theta_2) \end{aligned} θ^2=θ^3==λ+11(λx1+x2)λ2+λ+11(λ(λ+1)θ^2+x3)θ^2+λ2+λ+11(x3θ^2)
n → ∞ n\rightarrow\infty n 时,递推形式可以近似为
θ ^ n = θ ^ n − 1 + 1 ∑ i = 0 n − 1 λ i ( x n − θ ^ n − 1 ) = θ ^ n − 1 + ( 1 − λ ) ( x n − θ ^ n − 1 ) \begin{aligned} \hat{\theta}_n =& \hat\theta_{n-1}+\frac{1}{\sum_{i=0}^{n-1}\lambda^{i}}(x_n-\hat\theta_{n-1}) \\ =& \hat\theta_{n-1}+(1-\lambda)(x_n-\hat\theta_{n-1}) \end{aligned} θ^n==θ^n1+i=0n1λi1(xnθ^n1)θ^n1+(1λ)(xnθ^n1)

递推最小二乘中的遗忘因子

类似地,在最小二乘的误差 S i = ∣ ∣ y i − x ⃗ i T θ ⃗ ∣ ∣ 2 S_i=||y_i-\vec{x}_i^\text{T}\vec\theta||^2 Si=∣∣yix iTθ 2 中加入遗忘因子,得到总误差(以3组数据为例)
J 3 = λ 2 S 1 + λ S 2 + S 3 J_3=\lambda^2S_1+\lambda S_2+S_3 J3=λ2S1+λS2+S3
完整的方程可以写作
y ⃗ 3 = X 3 θ [ λ y 1 λ y 2 y 3 ] = [ λ x 11 λ x 12 λ x 21 λ x 22 x 31 x 32 ] [ θ 1 θ 2 ] \vec{y}_3=X_3\theta \\ \begin{bmatrix} \lambda y_1 \\ \sqrt{\lambda}y_2 \\ y_3 \end{bmatrix} =\begin{bmatrix} \lambda x_{11} & \lambda x_{12} \\ \sqrt{\lambda}x_{21} & \sqrt{\lambda}x_{22} \\ x_{31} & x_{32} \\ \end{bmatrix} \begin{bmatrix} \theta_1 \\ \theta_2 \end{bmatrix} y 3=X3θ λy1λ y2y3 = λx11λ x21x31λx12λ x22x32 [θ1θ2]
类似地代入最小二乘的解式(1),并求式(2)(3)的递推形式
X 2 T X 2 = [ λ x 11 x 21 λ x 12 x 22 ] [ λ x 11 λ x 12 x 21 x 22 ] X 3 T X 3 = [ λ x 11 λ x 21 x 31 λ x 12 λ x 22 x 32 ] [ λ x 11 λ x 12 λ x 21 λ x 22 x 31 x 32 ] = λ X 2 T X 2 + x ⃗ 3 x ⃗ 3 T X 2 T y ⃗ 2 = [ λ x 11 x 21 λ x 12 x 22 ] [ λ y 1 y 2 ] X 3 T y ⃗ 3 = [ λ x 11 λ x 21 x 31 λ x 12 λ x 22 x 32 ] [ λ y 1 λ y 2 y 3 ] = λ X 2 T y ⃗ 2 + x ⃗ 3 y 3 \begin{aligned} X_2^\text{T}X_2 =& \begin{bmatrix} \sqrt{\lambda}x_{11} & x_{21} \\ \sqrt{\lambda}x_{12} & x_{22} \\ \end{bmatrix} \begin{bmatrix} \sqrt{\lambda}x_{11} & \sqrt{\lambda} x_{12} \\ x_{21} & x_{22} \end{bmatrix} \\ X_3^\text{T}X_3 =& \begin{bmatrix} \lambda x_{11} & \sqrt{\lambda}x_{21} & x_{31} \\ \lambda x_{12} & \sqrt{\lambda}x_{22} & x_{32} \\ \end{bmatrix} \begin{bmatrix} \lambda x_{11} & \lambda x_{12} \\ \sqrt{\lambda}x_{21} & \sqrt{\lambda} x_{22} \\ x_{31} & x_{32} \end{bmatrix} \\ =& \lambda X_2^\text{T}X_2 + \vec{x}_3\vec{x}_3^\text{T} \\ X_2^\text{T}\vec{y}_2 =& \begin{bmatrix} \sqrt{\lambda}x_{11} & x_{21} \\ \sqrt{\lambda}x_{12} & x_{22} \\ \end{bmatrix} \begin{bmatrix} \sqrt{\lambda}y_1 \\ y_2 \end{bmatrix} \\ X_3^\text{T}\vec{y}_3 =& \begin{bmatrix} \lambda x_{11} & \sqrt{\lambda}x_{21} & x_{31} \\ \lambda x_{12} & \sqrt{\lambda}x_{22} & x_{32} \\ \end{bmatrix} \begin{bmatrix} \lambda y_1 \\ \sqrt{\lambda}y_2 \\ y_3 \end{bmatrix} \\ =& \lambda X_2^\text{T}\vec{y}_2 + \vec{x}_3y_3 \end{aligned} X2TX2=X3TX3==X2Ty 2=X3Ty 3==[λ x11λ x12x21x22][λ x11x21λ x12x22][λx11λx12λ x21λ x22x31x32] λx11λ x21x31λx12λ x22x32 λX2TX2+x 3x 3T[λ x11λ x12x21x22][λ y1y2][λx11λx12λ x21λ x22x31x32] λy1λ y2y3 λX2Ty 2+x 3y3

P k − 1 = λ P k − 1 − 1 + x ⃗ k x ⃗ k T X k T y ⃗ k = λ X k − 1 T y ⃗ k − 1 + x ⃗ k y k \begin{aligned} P_k^{-1} =& \lambda P_{k-1}^{-1} + \vec{x}_k\vec{x}_k^\text{T} \\ X_k^\text{T}\vec{y}_k =& \lambda X_{k-1}^\text{T}\vec{y}_{k-1} + \vec{x}_ky_k \end{aligned} Pk1=XkTy k=λPk11+x kx kTλXk1Ty k1+x kyk
代入式(1)得到
θ ⃗ k = P k X k T y ⃗ k = P k ( λ X k − 1 T y ⃗ k − 1 + x ⃗ k y k ) = P k ( λ P k − 1 − 1 θ ⃗ k − 1 + x ⃗ k y k ) = P k ( P k − 1 θ ⃗ k − 1 − x ⃗ k x ⃗ k T θ k − 1 + x ⃗ k y k ) = θ ⃗ k − 1 + P k x ⃗ k ( y k − x ⃗ k T θ k − 1 ) = θ ⃗ k − 1 + ( λ P k − 1 − 1 + x ⃗ k x ⃗ k T ) − 1 x ⃗ k ( y k − x ⃗ k T θ k − 1 ) = θ ⃗ k − 1 + ( P k − 1 λ − P k − 1 λ x ⃗ k x ⃗ k T P k − 1 λ 1 + x ⃗ k T P k − 1 λ x ⃗ k ) x ⃗ k ( y k − x ⃗ k T θ k − 1 ) = θ ⃗ k − 1 + ( P k − 1 λ − 1 λ P k − 1 x ⃗ k x ⃗ k T P k − 1 λ + x ⃗ k T P k − 1 x ⃗ k ) x ⃗ k ( y k − x ⃗ k T θ k − 1 ) \begin{aligned} \vec{\theta}_k =& P_kX_k^\text{T}\vec{y}_k \\ =& P_k(\lambda X_{k-1}^\text{T}\vec{y}_{k-1} + \vec{x}_ky_k) \\ =& P_k(\lambda P_{k-1}^{-1}\vec{\theta}_{k-1} + \vec{x}_ky_k) \\ =& P_k(P_k^{-1}\vec{\theta}_{k-1} -\vec{x}_k\vec{x}_k^\text{T}\theta_{k-1}+\vec{x}_ky_k) \\ =& \vec\theta_{k-1} + P_k\vec{x}_k(y_k-\vec{x}_k^\text{T}\theta_{k-1}) \\ =& \vec\theta_{k-1} + (\lambda P_{k-1}^{-1} + \vec{x}_k\vec{x}_k^\text{T})^{-1} \vec{x}_k(y_k-\vec{x}_k^\text{T}\theta_{k-1}) \\ =& \vec\theta_{k-1}+(\frac{P_{k-1}}{\lambda} -\frac{\frac{P_{k-1}}{\lambda}\vec{x}_k\vec{x}_k^\text{T} \frac{P_{k-1}}{\lambda}}{1+\vec{x}_k^\text{T}\frac{P_{k-1}}{\lambda}\vec{x}_k}) \vec{x}_k(y_k-\vec{x}_k^\text{T}\theta_{k-1}) \\ =& \vec\theta_{k-1}+(\frac{P_{k-1}}{\lambda} -\frac{1}{\lambda}\frac{P_{k-1}\vec{x}_k\vec{x}_k^\text{T}P_{k-1}} {\lambda + \vec{x}_k^\text{T}P_{k-1}\vec{x}_k}) \vec{x}_k(y_k-\vec{x}_k^\text{T}\theta_{k-1}) \\ \end{aligned} θ k========PkXkTy kPk(λXk1Ty k1+x kyk)Pk(λPk11θ k1+x kyk)Pk(Pk1θ k1x kx kTθk1+x kyk)θ k1+Pkx k(ykx kTθk1)θ k1+(λPk11+x kx kT)1x k(ykx kTθk1)θ k1+(λPk11+x kTλPk1x kλPk1x kx kTλPk1)x k(ykx kTθk1)θ k1+(λPk1λ1λ+x kTPk1x kPk1x kx kTPk1)x k(ykx kTθk1)

K k = P k − 1 x ⃗ k λ + x ⃗ k T P k − 1 x ⃗ k K_k = \frac{P_{k-1}\vec{x}_k} {\lambda+\vec{x}_k^\mathrm{T}P_{k-1}\vec{x}_k} Kk=λ+x kTPk1x kPk1x k

P k = 1 λ ( I − K k x ⃗ k T ) P k − 1 P k x ⃗ k = 1 λ ( P k − 1 x ⃗ k − K k x ⃗ k T P k − 1 x ⃗ k ) = 1 λ P k − 1 x ⃗ k ( λ + x ⃗ k T P k − 1 x ⃗ k ) − P k − 1 x ⃗ k x ⃗ k T P k − 1 x ⃗ k λ + x ⃗ k T P k − 1 x ⃗ k = 1 λ P k − 1 x ⃗ k λ λ + x ⃗ k T P k − 1 x ⃗ k = K k \begin{aligned} P_k =& \frac{1}{\lambda}(I-K_k\vec{x}_k^\mathrm{T})P_{k-1}\\ P_k\vec{x}_k =& \frac{1}{\lambda}(P_{k-1}\vec{x}_k -K_k\vec{x}_k^\mathrm{T}P_{k-1}\vec{x}_k) \\ =& \frac{1}{\lambda}\frac{P_{k-1}\vec{x}_k (\lambda+\vec{x}_k^\mathrm{T}P_{k-1}\vec{x}_k) -P_{k-1}\vec{x}_k\vec{x}_k^\mathrm{T} P_{k-1}\vec{x}_k} {\lambda+\vec{x}_k^\mathrm{T}P_{k-1}\vec{x}_k} \\ =& \frac{1}{\lambda}\frac{P_{k-1}\vec{x}_k\lambda} {\lambda+\vec{x}_k^\mathrm{T}P_{k-1}\vec{x}_k} \\ =& K_k \\ \end{aligned} Pk=Pkx k====λ1(IKkx kT)Pk1λ1(Pk1x kKkx kTPk1x k)λ1λ+x kTPk1x kPk1x k(λ+x kTPk1x k)Pk1x kx kTPk1x kλ1λ+x kTPk1x kPk1x kλKk
总结成递推公式得到
K k = P k − 1 x ⃗ k λ + x ⃗ k T P k − 1 x ⃗ k P k = 1 λ ( I − K k x ⃗ k T ) P k − 1 θ ⃗ k = θ ⃗ k − 1 + K k ( y k − x ⃗ k T θ k − 1 ) \begin{aligned} K_k =& \frac{P_{k-1}\vec{x}_k}{\lambda+\vec{x}_k^\mathrm{T} P_{k-1}\vec{x}_k}\\ P_k =& \frac{1}{\lambda}(I-K_k\vec{x}_k^\mathrm{T})P_{k-1} \\ \vec{\theta}_k =& \vec{\theta}_{k-1} +K_k(y_k-\vec{x}_k^\mathrm{T}\theta_{k-1}) \end{aligned} Kk=Pk=θ k=λ+x kTPk1x kPk1x kλ1(IKkx kT)Pk1θ k1+Kk(ykx kTθk1)
与不带遗忘因子的公式相比,第3个式子 θ ⃗ k \vec{\theta}_k θ k 不变,前两个稍微有变化。

一个例子与代码

  对方程 y = a x 2 + b x + c y=ax^2+bx+c y=ax2+bx+c,输入多组数据, x x x 取一些高斯噪声,以4组数据为例,
[ y 1 y 2 y 3 y 4 ] = [ x 1 2 x 1 1 x 2 2 x 2 1 x 3 2 x 3 1 x 4 2 x 4 1 ] [ a b c ] \begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ y_4 \end{bmatrix} =\begin{bmatrix} x_1^2 & x_1 & 1 \\ x_2^2 & x_2 & 1 \\ x_3^2 & x_3 & 1 \\ x_4^2 & x_4 & 1 \\ \end{bmatrix} \begin{bmatrix} a \\ b \\ c \end{bmatrix} y1y2y3y4 = x12x22x32x42x1x2x3x41111 abc
代码使用了自用的矩阵运算库zhnmat,见 自用的矩阵运算库zhnmat使用说明

#include <iostream>
#include <cmath>
#include "zhnmat.hpp"
using namespace zhnmat;
using namespace std;
int main() {
    constexpr double a = 0.5;
    constexpr double b = 1.1;
    constexpr double c = 2.1;
    double U, V, randx;
    double y, lambda=0.5;
    Mat K;
    Mat vx(3, 1);
    Mat P = eye(3)*1e6;
    Mat theta(3, 1);
    for (int i = 0; i < 10; i++) {
        U = (rand()+1.0) / (RAND_MAX+1.0);
        V = (rand()+1.0) / (RAND_MAX+1.0);
        randx = sqrt(-2.0 * log(U))* cos(6.283185307179586477 * V);  // `randx`服从标准正态分布
        randx *= 2;
        vx.set(0, 0, randx*randx);
        vx.set(1, 0, randx);
        vx.set(2, 0, 1);
        y = a*randx*randx + b*randx + c;
        K = P * vx / (lambda + (vx.T() * P * vx).at(0, 0));
        P = (eye(3) - K * vx.T()) * P / lambda;
        theta = theta + K * (y - (vx.T() * theta).at(0, 0));
        cout << "theta" << theta << endl;
    }
    return 0;
}
Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐