带遗忘因子的递推最小二乘法推导
首先,已知最小二乘法的解后,推导出解的递推形式;然后通过一个简单的公式y=x+w的形式引入遗忘因子,并改写成递推形式,最后将遗忘因子引入带递推的最小二乘法中。
摘要: 首先,已知最小二乘法的解后,推导出解的递推形式;然后通过一个简单的公式y=x+w的形式引入遗忘因子,并改写成递推形式,最后将遗忘因子引入带递推的最小二乘法中。
递推最小二乘法
对多组数据
x
⃗
i
\vec{x}_i
xi 和
y
i
y_i
yi,满足
y
i
=
x
⃗
i
T
θ
⃗
y_i = \vec{x}^\mathrm{T}_i\vec{\theta}
yi=xiTθ
其中
x
⃗
i
\vec{x}_i
xi 是输入数据向量,
y
i
y_i
yi 是输出数据标量。写成矩阵形式
y
⃗
=
X
θ
⃗
\vec{y} = X\vec{\theta}
y=Xθ
其中(以3组数据、2个未知参数为例)
y
⃗
=
[
y
1
y
2
y
3
]
,
X
=
[
x
⃗
1
T
x
⃗
2
T
x
⃗
3
T
]
,
θ
=
[
θ
1
θ
2
]
,
y
i
=
x
i
1
θ
1
+
x
i
2
θ
2
\vec{y} = \begin{bmatrix} y_1 \\ y_2 \\ y_3 \end{bmatrix}, X = \begin{bmatrix} \vec{x}^\mathrm{T}_1 \\ \vec{x}^\mathrm{T}_2 \\ \vec{x}^\mathrm{T}_3 \end{bmatrix}, \theta = \begin{bmatrix} \theta_1 \\ \theta_2 \end{bmatrix}, y_i=x_{i1}\theta_1+x_{i2}\theta_2
y=
y1y2y3
,X=
x1Tx2Tx3T
,θ=[θ1θ2],yi=xi1θ1+xi2θ2
完整地写作(以3组数据、2个未知参数为例)
[
y
1
y
2
y
3
]
=
[
x
11
x
12
x
21
x
22
x
31
x
32
]
[
θ
1
θ
2
]
\begin{bmatrix} y_1 \\ y_2 \\ y_3 \end{bmatrix} =\begin{bmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \\ x_{31} & x_{32} \\ \end{bmatrix} \begin{bmatrix} \theta_1 \\ \theta_2 \end{bmatrix}
y1y2y3
=
x11x21x31x12x22x32
[θ1θ2]
有
k
k
k组数据时记作
X
k
T
=
[
x
⃗
1
x
⃗
2
⋯
x
⃗
k
]
y
⃗
k
=
[
y
1
y
2
⋯
y
k
]
X^\mathrm{T}_k = \begin{bmatrix} \vec{x}_1 & \vec{x}_2 & \cdots & \vec{x}_k \end{bmatrix} \\ \vec{y}_k=\begin{bmatrix} y_1 & y_2 & \cdots & y_k\end{bmatrix}
XkT=[x1x2⋯xk]yk=[y1y2⋯yk]
(此时注意区分一些符号,例如
y
⃗
k
\vec{y}_k
yk 与
y
k
y_k
yk,
X
X
X 与
x
⃗
\vec{x}
x 等)
已知最小二乘法的解为
θ
⃗
=
(
X
T
X
)
−
1
X
T
y
⃗
(1)
\vec{\theta} =(X^\mathrm{T}X)^{-1} X^\mathrm{T}\vec{y} \tag{1}
θ=(XTX)−1XTy(1)
令
P
=
(
X
T
X
)
−
1
P = (X^\mathrm{T}X)^{-1}
P=(XTX)−1,
则加上递推后
P
k
−
1
=
X
k
T
X
k
=
∑
i
=
1
k
x
⃗
i
x
⃗
i
T
=
∑
i
=
1
k
−
1
x
⃗
i
x
⃗
i
T
+
x
⃗
k
x
⃗
k
T
=
P
k
−
1
−
1
+
x
⃗
k
x
⃗
k
T
(2)
\begin{aligned} P_k^{-1} =& X_k^\mathrm{T}X_k = \sum_{i=1}^k\vec{x}_i\vec{x}^\mathrm{T}_i\\ =& \sum_{i=1}^{k-1}\vec{x}_i\vec{x}^\mathrm{T}_i +\vec{x}_k\vec{x}^\mathrm{T}_k \\ =& P_{k-1}^{-1} + \vec{x}_k\vec{x}_k^\mathrm{T} \end{aligned} \tag{2}
Pk−1===XkTXk=i=1∑kxixiTi=1∑k−1xixiT+xkxkTPk−1−1+xkxkT(2)
同理可得
X
k
T
y
⃗
k
=
X
k
−
1
T
y
⃗
k
−
1
+
x
⃗
k
y
k
(3)
X_k^\mathrm{T}\vec{y}_k =X_{k-1}^\mathrm{T}\vec{y}_{k-1} +\vec{x}_ky_k \tag{3}
XkTyk=Xk−1Tyk−1+xkyk(3)
于是代入式(1)得到
θ
⃗
k
=
P
k
X
k
T
y
⃗
k
=
P
k
(
X
k
−
1
T
y
⃗
k
−
1
+
x
⃗
k
y
k
)
=
P
k
(
P
k
−
1
−
1
θ
⃗
k
−
1
+
x
⃗
k
y
k
)
=
P
k
(
P
k
−
1
θ
⃗
k
−
1
−
x
⃗
k
x
⃗
k
T
θ
k
−
1
+
x
⃗
k
y
k
)
=
θ
⃗
k
−
1
+
P
k
x
⃗
k
(
y
k
−
x
⃗
k
T
θ
k
−
1
)
=
θ
⃗
k
−
1
+
(
P
k
−
1
−
1
+
x
⃗
k
x
⃗
k
T
)
−
1
x
⃗
k
(
y
k
−
x
⃗
k
T
θ
k
−
1
)
=
θ
⃗
k
−
1
+
(
P
k
−
1
−
P
k
−
1
x
⃗
k
x
⃗
k
T
P
k
−
1
1
+
x
⃗
k
T
P
k
−
1
x
⃗
k
)
x
⃗
k
(
y
k
−
x
⃗
k
T
θ
k
−
1
)
\begin{aligned} \vec{\theta}_k =& P_kX_k^\mathrm{T}\vec{y}_k \\ =& P_k(X_{k-1}^\mathrm{T}\vec{y}_{k-1} +\vec{x}_ky_k) \\ =& P_k(P_{k-1}^{-1}\vec{\theta}_{k-1} +\vec{x}_ky_k) \\ =& P_k(P_k^{-1}\vec{\theta}_{k-1} -\vec{x}_k\vec{x}_k^\mathrm{T}\theta_{k-1}+\vec{x}_ky_k) \\ =& \vec{\theta}_{k-1} + P_k\vec{x}_k (y_k-\vec{x}_k^\mathrm{T}\theta_{k-1}) \\ =& \vec{\theta}_{k-1} +(P_{k-1}^{-1} + \vec{x}_k\vec{x}_k^\mathrm{T})^{-1} \vec{x}_k(y_k-\vec{x}_k^\mathrm{T}\theta_{k-1}) \\ =& \vec{\theta}_{k-1} +(P_{k-1}-\frac{P_{k-1}\vec{x}_k\vec{x}_k^\mathrm{T} P_{k-1}}{1+\vec{x}_k^\mathrm{T}P_{k-1}\vec{x}_k}) \vec{x}_k(y_k-\vec{x}_k^\mathrm{T}\theta_{k-1}) \\ \end{aligned}
θk=======PkXkTykPk(Xk−1Tyk−1+xkyk)Pk(Pk−1−1θk−1+xkyk)Pk(Pk−1θk−1−xkxkTθk−1+xkyk)θk−1+Pkxk(yk−xkTθk−1)θk−1+(Pk−1−1+xkxkT)−1xk(yk−xkTθk−1)θk−1+(Pk−1−1+xkTPk−1xkPk−1xkxkTPk−1)xk(yk−xkTθk−1)
其中倒数第3行到倒数第一行的过程
P
k
=
P
k
−
1
−
P
k
−
1
x
⃗
k
x
⃗
k
T
P
k
−
1
1
+
x
⃗
k
T
P
k
−
1
x
⃗
k
P_k = P_{k-1}-\frac{P_{k-1}\vec{x}_k\vec{x}_k^\mathrm{T} P_{k-1}}{1+\vec{x}_k^\mathrm{T}P_{k-1}\vec{x}_k}
Pk=Pk−1−1+xkTPk−1xkPk−1xkxkTPk−1
用到了下面的矩阵求逆公式及其引理
(
A
+
B
C
D
)
−
1
=
A
−
1
−
A
−
1
B
(
D
A
−
1
B
+
C
−
1
)
−
1
D
A
−
1
(
A
+
u
⃗
u
⃗
T
)
−
1
=
A
−
1
−
A
−
1
u
⃗
u
⃗
T
A
−
1
1
+
u
⃗
T
A
−
1
u
⃗
\begin{aligned} (A+BCD)^{-1} =& A^{-1}-A^{-1} B (DA^{-1}B+C^{-1})^{-1}DA^{-1} \\ (A+\vec{u}\vec{u}^\text{T})^{-1} =& A^{-1} -\frac{A^{-1}\vec{u}\vec{u}^\text{T} A^{-1}} {1+\vec{u}^\text{T}A^{-1}\vec{u}} \end{aligned}
(A+BCD)−1=(A+uuT)−1=A−1−A−1B(DA−1B+C−1)−1DA−1A−1−1+uTA−1uA−1uuTA−1
令
K
k
=
P
k
−
1
x
⃗
k
1
+
x
⃗
k
T
P
k
−
1
x
⃗
k
K_k = \frac{P_{k-1}\vec{x}_k} {1+\vec{x}_k^\mathrm{T}P_{k-1}\vec{x}_k}
Kk=1+xkTPk−1xkPk−1xk
则
P
k
=
(
I
−
K
k
x
⃗
k
T
)
P
k
−
1
P
k
x
⃗
k
=
P
k
−
1
x
⃗
k
−
K
k
x
⃗
k
T
P
k
−
1
x
⃗
k
=
P
k
−
1
x
⃗
k
(
1
+
x
⃗
k
T
P
k
−
1
x
⃗
k
)
−
P
k
−
1
x
⃗
k
x
⃗
k
T
P
k
−
1
x
⃗
k
1
+
x
⃗
k
T
P
k
−
1
x
⃗
k
=
P
k
−
1
x
⃗
k
1
+
x
⃗
k
T
P
k
−
1
x
⃗
k
=
K
k
\begin{aligned} P_k =& (I-K_k\vec{x}_k^\mathrm{T})P_{k-1}\\ P_k\vec{x}_k =& P_{k-1}\vec{x}_k -K_k\vec{x}_k^\mathrm{T}P_{k-1}\vec{x}_k \\ =& \frac{P_{k-1}\vec{x}_k (1+\vec{x}_k^\mathrm{T}P_{k-1}\vec{x}_k) -P_{k-1}\vec{x}_k\vec{x}_k^\mathrm{T} P_{k-1}\vec{x}_k} {1+\vec{x}_k^\mathrm{T}P_{k-1}\vec{x}_k} \\ =& \frac{P_{k-1}\vec{x}_k} {1+\vec{x}_k^\mathrm{T}P_{k-1}\vec{x}_k} \\ =& K_k \\ \end{aligned}
Pk=Pkxk====(I−KkxkT)Pk−1Pk−1xk−KkxkTPk−1xk1+xkTPk−1xkPk−1xk(1+xkTPk−1xk)−Pk−1xkxkTPk−1xk1+xkTPk−1xkPk−1xkKk
总结成递推公式得到
K
k
=
P
k
−
1
x
⃗
k
1
+
x
⃗
k
T
P
k
−
1
x
⃗
k
P
k
=
(
I
−
K
k
x
⃗
k
T
)
P
k
−
1
θ
⃗
k
=
θ
⃗
k
−
1
+
K
k
(
y
k
−
x
⃗
k
T
θ
k
−
1
)
\begin{aligned} K_k =& \frac{P_{k-1}\vec{x}_k}{1+\vec{x}_k^\mathrm{T} P_{k-1}\vec{x}_k}\\ P_k =& (I-K_k\vec{x}_k^\mathrm{T})P_{k-1} \\ \vec{\theta}_k =& \vec{\theta}_{k-1} +K_k(y_k-\vec{x}_k^\mathrm{T}\theta_{k-1}) \end{aligned}
Kk=Pk=θk=1+xkTPk−1xkPk−1xk(I−KkxkT)Pk−1θk−1+Kk(yk−xkTθk−1)
一些注意事项
因为
P
=
(
X
T
X
)
−
1
P=(X^\text{T}X)^{-1}
P=(XTX)−1,
x
⃗
1
x
⃗
1
T
\vec{x}_1\vec{x}_1^\text{T}
x1x1T 一般很小,所以实际使用时一般把
P
1
P_1
P1 初始化为一个接近无穷大的单位阵。
可以看出
P
P
P 是个对称矩阵,如果因为数值计算的原因导致
P
P
P 不严格对称时,代码里需要加一步:
P
←
(
P
+
P
T
)
/
2
P\leftarrow(P+P^\text{T})/2
P←(P+PT)/2。
引入遗忘因子
先考虑一个比较简单的情况
x
n
=
θ
+
w
n
x_n=\theta+w_n
xn=θ+wn
其中
θ
\theta
θ 是要估计的参数,
w
n
w_n
wn 是高斯白噪声,
x
n
x_n
xn 是观测到的数据,总共有
N
N
N 组数据,将
N
N
N 组数据取平均得到
θ
\theta
θ 的一个估计值为
θ
^
=
1
N
∑
n
=
1
N
x
n
\hat{\theta}=\frac{1}{N}\sum_{n=1}^Nx_n
θ^=N1n=1∑Nxn
写成递推形式为
θ
^
n
=
θ
^
n
−
1
+
1
n
(
x
n
−
θ
^
n
−
1
)
\hat\theta_n=\hat\theta_{n-1}+\frac{1}{n}(x_n-\hat\theta_{n-1})
θ^n=θ^n−1+n1(xn−θ^n−1)
例如,
θ
^
1
=
x
1
θ
^
2
=
θ
^
1
+
1
2
(
x
2
−
θ
^
1
)
=
1
2
(
x
1
+
x
2
)
θ
^
3
=
θ
^
2
+
1
3
(
x
3
−
θ
^
2
)
=
1
3
(
x
1
+
x
2
+
x
3
)
\begin{aligned} \hat\theta_1 =& x_1\\ \hat\theta_2 =& \hat\theta_1+\frac{1}{2}(x_2-\hat\theta_1) =\frac{1}{2}(x_1+x_2) \\ \hat\theta_3 =& \hat\theta_2+\frac{1}{3}(x_3-\hat\theta_2) =\frac{1}{3}(x_1+x_2+x_3) \\ \end{aligned}
θ^1=θ^2=θ^3=x1θ^1+21(x2−θ^1)=21(x1+x2)θ^2+31(x3−θ^2)=31(x1+x2+x3)
现在对之前的值乘一个衰减系数
λ
∈
(
0
,
1
)
\lambda\in(0,1)
λ∈(0,1),也就是
θ
^
2
=
1
λ
+
1
(
λ
x
1
+
x
2
)
θ
^
3
=
1
λ
2
+
λ
+
1
(
λ
2
x
1
+
λ
x
2
+
x
3
)
⋮
\begin{aligned} \hat\theta_2 =& \frac{1}{\lambda+1}(\lambda x_1+x_2) \\ \hat\theta_3 =& \frac{1}{\lambda^2+\lambda+1}(\lambda^2x_1+\lambda x_2+x_3) \\ \vdots \end{aligned}
θ^2=θ^3=⋮λ+11(λx1+x2)λ2+λ+11(λ2x1+λx2+x3)
写成递推形式为
θ
^
2
=
1
λ
+
1
(
λ
x
1
+
x
2
)
θ
^
3
=
1
λ
2
+
λ
+
1
(
λ
(
λ
+
1
)
θ
^
2
+
x
3
)
=
θ
^
2
+
1
λ
2
+
λ
+
1
(
x
3
−
θ
^
2
)
\begin{aligned} \hat\theta_2=&\frac{1}{\lambda+1}(\lambda x_1+x_2) \\ \hat\theta_3=&\frac{1}{\lambda^2+\lambda+1}(\lambda(\lambda+1)\hat\theta_2+x_3) \\ =&\hat\theta_2+\frac{1}{\lambda^2+\lambda+1}(x_3-\hat\theta_2) \end{aligned}
θ^2=θ^3==λ+11(λx1+x2)λ2+λ+11(λ(λ+1)θ^2+x3)θ^2+λ2+λ+11(x3−θ^2)
当
n
→
∞
n\rightarrow\infty
n→∞ 时,递推形式可以近似为
θ
^
n
=
θ
^
n
−
1
+
1
∑
i
=
0
n
−
1
λ
i
(
x
n
−
θ
^
n
−
1
)
=
θ
^
n
−
1
+
(
1
−
λ
)
(
x
n
−
θ
^
n
−
1
)
\begin{aligned} \hat{\theta}_n =& \hat\theta_{n-1}+\frac{1}{\sum_{i=0}^{n-1}\lambda^{i}}(x_n-\hat\theta_{n-1}) \\ =& \hat\theta_{n-1}+(1-\lambda)(x_n-\hat\theta_{n-1}) \end{aligned}
θ^n==θ^n−1+∑i=0n−1λi1(xn−θ^n−1)θ^n−1+(1−λ)(xn−θ^n−1)
递推最小二乘中的遗忘因子
类似地,在最小二乘的误差
S
i
=
∣
∣
y
i
−
x
⃗
i
T
θ
⃗
∣
∣
2
S_i=||y_i-\vec{x}_i^\text{T}\vec\theta||^2
Si=∣∣yi−xiTθ∣∣2 中加入遗忘因子,得到总误差(以3组数据为例)
J
3
=
λ
2
S
1
+
λ
S
2
+
S
3
J_3=\lambda^2S_1+\lambda S_2+S_3
J3=λ2S1+λS2+S3
完整的方程可以写作
y
⃗
3
=
X
3
θ
[
λ
y
1
λ
y
2
y
3
]
=
[
λ
x
11
λ
x
12
λ
x
21
λ
x
22
x
31
x
32
]
[
θ
1
θ
2
]
\vec{y}_3=X_3\theta \\ \begin{bmatrix} \lambda y_1 \\ \sqrt{\lambda}y_2 \\ y_3 \end{bmatrix} =\begin{bmatrix} \lambda x_{11} & \lambda x_{12} \\ \sqrt{\lambda}x_{21} & \sqrt{\lambda}x_{22} \\ x_{31} & x_{32} \\ \end{bmatrix} \begin{bmatrix} \theta_1 \\ \theta_2 \end{bmatrix}
y3=X3θ
λy1λy2y3
=
λx11λx21x31λx12λx22x32
[θ1θ2]
类似地代入最小二乘的解式(1),并求式(2)(3)的递推形式
X
2
T
X
2
=
[
λ
x
11
x
21
λ
x
12
x
22
]
[
λ
x
11
λ
x
12
x
21
x
22
]
X
3
T
X
3
=
[
λ
x
11
λ
x
21
x
31
λ
x
12
λ
x
22
x
32
]
[
λ
x
11
λ
x
12
λ
x
21
λ
x
22
x
31
x
32
]
=
λ
X
2
T
X
2
+
x
⃗
3
x
⃗
3
T
X
2
T
y
⃗
2
=
[
λ
x
11
x
21
λ
x
12
x
22
]
[
λ
y
1
y
2
]
X
3
T
y
⃗
3
=
[
λ
x
11
λ
x
21
x
31
λ
x
12
λ
x
22
x
32
]
[
λ
y
1
λ
y
2
y
3
]
=
λ
X
2
T
y
⃗
2
+
x
⃗
3
y
3
\begin{aligned} X_2^\text{T}X_2 =& \begin{bmatrix} \sqrt{\lambda}x_{11} & x_{21} \\ \sqrt{\lambda}x_{12} & x_{22} \\ \end{bmatrix} \begin{bmatrix} \sqrt{\lambda}x_{11} & \sqrt{\lambda} x_{12} \\ x_{21} & x_{22} \end{bmatrix} \\ X_3^\text{T}X_3 =& \begin{bmatrix} \lambda x_{11} & \sqrt{\lambda}x_{21} & x_{31} \\ \lambda x_{12} & \sqrt{\lambda}x_{22} & x_{32} \\ \end{bmatrix} \begin{bmatrix} \lambda x_{11} & \lambda x_{12} \\ \sqrt{\lambda}x_{21} & \sqrt{\lambda} x_{22} \\ x_{31} & x_{32} \end{bmatrix} \\ =& \lambda X_2^\text{T}X_2 + \vec{x}_3\vec{x}_3^\text{T} \\ X_2^\text{T}\vec{y}_2 =& \begin{bmatrix} \sqrt{\lambda}x_{11} & x_{21} \\ \sqrt{\lambda}x_{12} & x_{22} \\ \end{bmatrix} \begin{bmatrix} \sqrt{\lambda}y_1 \\ y_2 \end{bmatrix} \\ X_3^\text{T}\vec{y}_3 =& \begin{bmatrix} \lambda x_{11} & \sqrt{\lambda}x_{21} & x_{31} \\ \lambda x_{12} & \sqrt{\lambda}x_{22} & x_{32} \\ \end{bmatrix} \begin{bmatrix} \lambda y_1 \\ \sqrt{\lambda}y_2 \\ y_3 \end{bmatrix} \\ =& \lambda X_2^\text{T}\vec{y}_2 + \vec{x}_3y_3 \end{aligned}
X2TX2=X3TX3==X2Ty2=X3Ty3==[λx11λx12x21x22][λx11x21λx12x22][λx11λx12λx21λx22x31x32]
λx11λx21x31λx12λx22x32
λX2TX2+x3x3T[λx11λx12x21x22][λy1y2][λx11λx12λx21λx22x31x32]
λy1λy2y3
λX2Ty2+x3y3
即
P
k
−
1
=
λ
P
k
−
1
−
1
+
x
⃗
k
x
⃗
k
T
X
k
T
y
⃗
k
=
λ
X
k
−
1
T
y
⃗
k
−
1
+
x
⃗
k
y
k
\begin{aligned} P_k^{-1} =& \lambda P_{k-1}^{-1} + \vec{x}_k\vec{x}_k^\text{T} \\ X_k^\text{T}\vec{y}_k =& \lambda X_{k-1}^\text{T}\vec{y}_{k-1} + \vec{x}_ky_k \end{aligned}
Pk−1=XkTyk=λPk−1−1+xkxkTλXk−1Tyk−1+xkyk
代入式(1)得到
θ
⃗
k
=
P
k
X
k
T
y
⃗
k
=
P
k
(
λ
X
k
−
1
T
y
⃗
k
−
1
+
x
⃗
k
y
k
)
=
P
k
(
λ
P
k
−
1
−
1
θ
⃗
k
−
1
+
x
⃗
k
y
k
)
=
P
k
(
P
k
−
1
θ
⃗
k
−
1
−
x
⃗
k
x
⃗
k
T
θ
k
−
1
+
x
⃗
k
y
k
)
=
θ
⃗
k
−
1
+
P
k
x
⃗
k
(
y
k
−
x
⃗
k
T
θ
k
−
1
)
=
θ
⃗
k
−
1
+
(
λ
P
k
−
1
−
1
+
x
⃗
k
x
⃗
k
T
)
−
1
x
⃗
k
(
y
k
−
x
⃗
k
T
θ
k
−
1
)
=
θ
⃗
k
−
1
+
(
P
k
−
1
λ
−
P
k
−
1
λ
x
⃗
k
x
⃗
k
T
P
k
−
1
λ
1
+
x
⃗
k
T
P
k
−
1
λ
x
⃗
k
)
x
⃗
k
(
y
k
−
x
⃗
k
T
θ
k
−
1
)
=
θ
⃗
k
−
1
+
(
P
k
−
1
λ
−
1
λ
P
k
−
1
x
⃗
k
x
⃗
k
T
P
k
−
1
λ
+
x
⃗
k
T
P
k
−
1
x
⃗
k
)
x
⃗
k
(
y
k
−
x
⃗
k
T
θ
k
−
1
)
\begin{aligned} \vec{\theta}_k =& P_kX_k^\text{T}\vec{y}_k \\ =& P_k(\lambda X_{k-1}^\text{T}\vec{y}_{k-1} + \vec{x}_ky_k) \\ =& P_k(\lambda P_{k-1}^{-1}\vec{\theta}_{k-1} + \vec{x}_ky_k) \\ =& P_k(P_k^{-1}\vec{\theta}_{k-1} -\vec{x}_k\vec{x}_k^\text{T}\theta_{k-1}+\vec{x}_ky_k) \\ =& \vec\theta_{k-1} + P_k\vec{x}_k(y_k-\vec{x}_k^\text{T}\theta_{k-1}) \\ =& \vec\theta_{k-1} + (\lambda P_{k-1}^{-1} + \vec{x}_k\vec{x}_k^\text{T})^{-1} \vec{x}_k(y_k-\vec{x}_k^\text{T}\theta_{k-1}) \\ =& \vec\theta_{k-1}+(\frac{P_{k-1}}{\lambda} -\frac{\frac{P_{k-1}}{\lambda}\vec{x}_k\vec{x}_k^\text{T} \frac{P_{k-1}}{\lambda}}{1+\vec{x}_k^\text{T}\frac{P_{k-1}}{\lambda}\vec{x}_k}) \vec{x}_k(y_k-\vec{x}_k^\text{T}\theta_{k-1}) \\ =& \vec\theta_{k-1}+(\frac{P_{k-1}}{\lambda} -\frac{1}{\lambda}\frac{P_{k-1}\vec{x}_k\vec{x}_k^\text{T}P_{k-1}} {\lambda + \vec{x}_k^\text{T}P_{k-1}\vec{x}_k}) \vec{x}_k(y_k-\vec{x}_k^\text{T}\theta_{k-1}) \\ \end{aligned}
θk========PkXkTykPk(λXk−1Tyk−1+xkyk)Pk(λPk−1−1θk−1+xkyk)Pk(Pk−1θk−1−xkxkTθk−1+xkyk)θk−1+Pkxk(yk−xkTθk−1)θk−1+(λPk−1−1+xkxkT)−1xk(yk−xkTθk−1)θk−1+(λPk−1−1+xkTλPk−1xkλPk−1xkxkTλPk−1)xk(yk−xkTθk−1)θk−1+(λPk−1−λ1λ+xkTPk−1xkPk−1xkxkTPk−1)xk(yk−xkTθk−1)
令
K
k
=
P
k
−
1
x
⃗
k
λ
+
x
⃗
k
T
P
k
−
1
x
⃗
k
K_k = \frac{P_{k-1}\vec{x}_k} {\lambda+\vec{x}_k^\mathrm{T}P_{k-1}\vec{x}_k}
Kk=λ+xkTPk−1xkPk−1xk
则
P
k
=
1
λ
(
I
−
K
k
x
⃗
k
T
)
P
k
−
1
P
k
x
⃗
k
=
1
λ
(
P
k
−
1
x
⃗
k
−
K
k
x
⃗
k
T
P
k
−
1
x
⃗
k
)
=
1
λ
P
k
−
1
x
⃗
k
(
λ
+
x
⃗
k
T
P
k
−
1
x
⃗
k
)
−
P
k
−
1
x
⃗
k
x
⃗
k
T
P
k
−
1
x
⃗
k
λ
+
x
⃗
k
T
P
k
−
1
x
⃗
k
=
1
λ
P
k
−
1
x
⃗
k
λ
λ
+
x
⃗
k
T
P
k
−
1
x
⃗
k
=
K
k
\begin{aligned} P_k =& \frac{1}{\lambda}(I-K_k\vec{x}_k^\mathrm{T})P_{k-1}\\ P_k\vec{x}_k =& \frac{1}{\lambda}(P_{k-1}\vec{x}_k -K_k\vec{x}_k^\mathrm{T}P_{k-1}\vec{x}_k) \\ =& \frac{1}{\lambda}\frac{P_{k-1}\vec{x}_k (\lambda+\vec{x}_k^\mathrm{T}P_{k-1}\vec{x}_k) -P_{k-1}\vec{x}_k\vec{x}_k^\mathrm{T} P_{k-1}\vec{x}_k} {\lambda+\vec{x}_k^\mathrm{T}P_{k-1}\vec{x}_k} \\ =& \frac{1}{\lambda}\frac{P_{k-1}\vec{x}_k\lambda} {\lambda+\vec{x}_k^\mathrm{T}P_{k-1}\vec{x}_k} \\ =& K_k \\ \end{aligned}
Pk=Pkxk====λ1(I−KkxkT)Pk−1λ1(Pk−1xk−KkxkTPk−1xk)λ1λ+xkTPk−1xkPk−1xk(λ+xkTPk−1xk)−Pk−1xkxkTPk−1xkλ1λ+xkTPk−1xkPk−1xkλKk
总结成递推公式得到
K
k
=
P
k
−
1
x
⃗
k
λ
+
x
⃗
k
T
P
k
−
1
x
⃗
k
P
k
=
1
λ
(
I
−
K
k
x
⃗
k
T
)
P
k
−
1
θ
⃗
k
=
θ
⃗
k
−
1
+
K
k
(
y
k
−
x
⃗
k
T
θ
k
−
1
)
\begin{aligned} K_k =& \frac{P_{k-1}\vec{x}_k}{\lambda+\vec{x}_k^\mathrm{T} P_{k-1}\vec{x}_k}\\ P_k =& \frac{1}{\lambda}(I-K_k\vec{x}_k^\mathrm{T})P_{k-1} \\ \vec{\theta}_k =& \vec{\theta}_{k-1} +K_k(y_k-\vec{x}_k^\mathrm{T}\theta_{k-1}) \end{aligned}
Kk=Pk=θk=λ+xkTPk−1xkPk−1xkλ1(I−KkxkT)Pk−1θk−1+Kk(yk−xkTθk−1)
与不带遗忘因子的公式相比,第3个式子
θ
⃗
k
\vec{\theta}_k
θk 不变,前两个稍微有变化。
一个例子与代码
对方程
y
=
a
x
2
+
b
x
+
c
y=ax^2+bx+c
y=ax2+bx+c,输入多组数据,
x
x
x 取一些高斯噪声,以4组数据为例,
[
y
1
y
2
y
3
y
4
]
=
[
x
1
2
x
1
1
x
2
2
x
2
1
x
3
2
x
3
1
x
4
2
x
4
1
]
[
a
b
c
]
\begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ y_4 \end{bmatrix} =\begin{bmatrix} x_1^2 & x_1 & 1 \\ x_2^2 & x_2 & 1 \\ x_3^2 & x_3 & 1 \\ x_4^2 & x_4 & 1 \\ \end{bmatrix} \begin{bmatrix} a \\ b \\ c \end{bmatrix}
y1y2y3y4
=
x12x22x32x42x1x2x3x41111
abc
代码使用了自用的矩阵运算库zhnmat
,见 自用的矩阵运算库zhnmat使用说明
#include <iostream>
#include <cmath>
#include "zhnmat.hpp"
using namespace zhnmat;
using namespace std;
int main() {
constexpr double a = 0.5;
constexpr double b = 1.1;
constexpr double c = 2.1;
double U, V, randx;
double y, lambda=0.5;
Mat K;
Mat vx(3, 1);
Mat P = eye(3)*1e6;
Mat theta(3, 1);
for (int i = 0; i < 10; i++) {
U = (rand()+1.0) / (RAND_MAX+1.0);
V = (rand()+1.0) / (RAND_MAX+1.0);
randx = sqrt(-2.0 * log(U))* cos(6.283185307179586477 * V); // `randx`服从标准正态分布
randx *= 2;
vx.set(0, 0, randx*randx);
vx.set(1, 0, randx);
vx.set(2, 0, 1);
y = a*randx*randx + b*randx + c;
K = P * vx / (lambda + (vx.T() * P * vx).at(0, 0));
P = (eye(3) - K * vx.T()) * P / lambda;
theta = theta + K * (y - (vx.T() * theta).at(0, 0));
cout << "theta" << theta << endl;
}
return 0;
}
开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!
更多推荐
所有评论(0)