微分动态规划(Differential Dynamic Programming, DDP)
微分动态规划(Differential Dynamic Programming, DDP)变分法思路由贝尔曼最优性原理得到:V(X,t)=minu∈Ω{ϕ[X(tf),tf]+∫t0tfL(x(t),u(t),t)dt}=minu∈Ω{∫t0t0+dtL(x(τ),u(τ),τ)dτ+V(X+ΔX,t+dt)}\begin{aligned}V(X, t)&=\min ...
微分动态规划(Differential Dynamic Programming, DDP)
变分法思路
由贝尔曼最优性原理得到:
V
(
X
,
t
)
=
min
u
∈
Ω
{
ϕ
[
X
(
t
f
)
,
t
f
]
+
∫
t
0
t
f
L
(
x
(
t
)
,
u
(
t
)
,
t
)
d
t
}
=
min
u
∈
Ω
{
∫
t
0
t
0
+
d
t
L
(
x
(
τ
)
,
u
(
τ
)
,
τ
)
d
τ
+
V
(
X
+
Δ
X
,
t
+
d
t
)
}
\begin{aligned} V(X, t)&=\min _{u \in \Omega}\left\{\phi\left[X\left(t_{f}\right), t_{f}\right]+\int_{t_{0}}^{t_{f}} L(x(t), u(t), t) d t\right\}\\ &=\min _{u \in \Omega}\left\{\int_{t_{0}}^{t_{0}+d t} L(x(\tau), u(\tau), \tau) d \tau+V(X+\Delta X, t+d t)\right\} \end{aligned}
V(X,t)=u∈Ωmin{ϕ[X(tf),tf]+∫t0tfL(x(t),u(t),t)dt}=u∈Ωmin{∫t0t0+dtL(x(τ),u(τ),τ)dτ+V(X+ΔX,t+dt)}
离散化得:
x
k
+
1
=
f
(
x
k
,
u
k
)
V
k
=
min
u
[
l
(
x
k
,
u
k
)
+
V
k
+
1
(
x
k
+
1
)
]
\begin{array}{l}{x_{k+1}=f\left(x_{k}, u_{k}\right)} \\ \\{V_{k}=\min _{u}\left[l\left(x_{k},u_{k}\right)+V_{k+1}\left(x_{k+1}\right)\right]}\end{array}
xk+1=f(xk,uk)Vk=minu[l(xk,uk)+Vk+1(xk+1)]
对
k
k
k时刻的值函数
V
(
x
,
k
)
V(x,k)
V(x,k)做变分,令
Q
(
δ
x
,
δ
u
)
=
V
k
(
x
+
δ
x
)
−
V
k
(
x
)
Q(\delta x,\delta u)=V_k(x+\delta x)-V_k(x)
Q(δx,δu)=Vk(x+δx)−Vk(x),对
Q
Q
Q在标称轨迹
(
x
k
,
u
k
)
(x_k,u_k)
(xk,uk)进行二阶泰勒展开:
Q
(
δ
x
,
δ
u
)
=
V
(
x
+
δ
x
)
−
V
(
x
)
=
l
(
x
k
+
δ
x
k
,
u
k
+
δ
u
k
)
+
V
k
+
1
(
x
k
+
1
+
δ
x
k
+
1
)
−
(
l
(
x
k
,
u
k
)
+
V
k
+
1
(
x
k
+
1
)
)
≈
δ
x
k
T
l
x
k
+
δ
u
k
T
l
u
k
+
1
2
(
δ
x
k
T
l
x
x
k
δ
x
k
+
2
δ
x
k
T
l
x
u
k
δ
u
k
+
δ
u
k
T
l
u
u
k
δ
u
k
)
+
δ
x
k
+
1
T
V
x
k
+
1
+
1
2
δ
x
k
+
1
T
V
x
x
k
+
1
δ
x
k
+
1
\begin{aligned} Q(\delta x, \delta u)&=V(x+\delta x)-V(x)\\ &=l\left(x_{k}+\delta x_{k}, u_{k}+\delta u_{k}\right)+V_{k+1}\left(x_{k+1}+\delta x_{k+1}\right)-\left(l\left(x_{k}, u_{k}\right)+V_{k+1}\left(x_{k+1}\right)\right)\\ &\approx \delta x_{k}^{T} l_{x_{k}}+\delta u_{k}^{T} l_{u_{k}}+\frac{1}{2}\left(\delta x_{k}^{T} l_{x x_{k}} \delta x_{k}+2 \delta x_{k}^{T} l_{x u_{k}} \delta u_{k}+\delta u_{k}^{T} l_{u u_{k}} \delta u_{k}\right)+\\&\qquad \qquad\qquad\qquad\quad\delta x_{k+1}^{T} V_{x_{k+1}}+\frac{1}{2} \delta x_{k+1}^{T} V_{x x_{k+1}} \delta x_{k+1} \end{aligned}
Q(δx,δu)=V(x+δx)−V(x)=l(xk+δxk,uk+δuk)+Vk+1(xk+1+δxk+1)−(l(xk,uk)+Vk+1(xk+1))≈δxkTlxk+δukTluk+21(δxkTlxxkδxk+2δxkTlxukδuk+δukTluukδuk)+δxk+1TVxk+1+21δxk+1TVxxk+1δxk+1
又有
x
k
+
1
=
f
(
x
k
,
u
k
)
x_{k+1}=f(x_k,u_k)
xk+1=f(xk,uk),二阶泰勒展开得到:
δ
x
k
+
1
=
δ
f
(
x
k
,
u
k
)
=
f
x
k
δ
x
k
+
f
u
k
δ
u
k
+
1
2
(
δ
x
k
T
f
x
x
k
δ
x
+
2
δ
x
k
T
f
x
u
k
δ
u
k
+
δ
u
k
T
f
u
u
k
δ
u
)
\delta x_{k+1}=\delta f\left(x_{k}, u_{k}\right)=f_{x_{k}} \delta x_{k}+f_{u_{k}} \delta u_{k}+\frac{1}{2}\left(\delta x_{k}^{T} f_{x x_{k}} \delta x+2 \delta x_{k}^{T} f_{x u_{k}} \delta u_{k}+\delta u_{k}^{T} f_{u u_{k}} \delta u\right)
δxk+1=δf(xk,uk)=fxkδxk+fukδuk+21(δxkTfxxkδx+2δxkTfxukδuk+δukTfuukδu)
将
δ
x
k
+
1
\delta x_{k+1}
δxk+1带入
Q
(
δ
x
,
δ
u
)
Q(\delta x,\delta u)
Q(δx,δu)中,得到:
Q
(
δ
x
,
δ
u
)
≈
1
2
[
1
δ
x
k
δ
u
k
]
T
[
0
Q
x
k
T
Q
u
k
T
Q
x
k
Q
x
x
k
Q
x
u
k
T
Q
u
k
Q
u
x
k
Q
u
u
k
]
[
1
δ
x
k
δ
u
k
]
Q(\delta x,\delta u) \approx\frac{1}{2}\left[\begin{array}{c}{1} \\ {\delta x_k} \\ {\delta u_k}\end{array}\right]^T\left[\begin{array}{ccc}{0} & {Q_{\mathrm{x}_k}^{T}} & {Q_{\mathrm{u}_k}^{T}} \\ {Q_{\mathrm{x}_k}} & {Q_{\mathrm{xx}_k}} & {Q_{\mathrm{xu}_k}^T} \\ {Q_{\mathrm{u}_k}} & {Q_{\mathrm{ux}_k}} & {Q_{\mathrm{uu}_k}}\end{array}\right]\left[\begin{array}{c}{1} \\ {\delta x_k} \\ {\delta u_k}\end{array}\right]
Q(δx,δu)≈21⎣⎡1δxkδuk⎦⎤T⎣⎡0QxkQukQxkTQxxkQuxkQukTQxukTQuuk⎦⎤⎣⎡1δxkδuk⎦⎤
其中:
Q
x
=
l
x
k
+
f
x
k
T
V
x
k
+
1
Q
u
=
l
u
k
+
f
u
k
T
V
x
k
+
1
Q
x
x
=
l
x
x
k
+
f
x
k
T
V
x
x
k
+
1
f
x
k
+
V
x
k
+
1
f
x
k
x
k
Q
u
u
=
l
u
u
k
+
f
u
k
T
V
x
x
k
+
1
f
u
k
+
V
x
k
+
1
f
u
k
u
k
Q
u
x
=
l
u
x
k
+
f
u
k
T
V
x
x
k
+
1
f
x
k
+
V
x
k
+
1
f
u
k
x
k
\begin{aligned} &Q_{x} =l_{x_{k}}+f_{x_{k}}^{T} V_{x_{k+1}} \\ &Q_{u} =l_{u_{k}}+f_{u_{k}}^{T} V_{x_{k+1}} \\ &Q_{x x} =l_{x x_{k}}+f_{x_{k}}^{T} V_{x x_{k+1}} f_{x_{k}}+V_{x_{k+1}} f_{x_{k} x_{k}} \\ &Q_{u u} =l_{u u_{k}}+f_{u_{k}}^{T} V_{x x_{k+1}} f_{u_{k}}+V_{x_{k+1}} f_{u_{k} u_{k}} \\ &Q_{u x} =l_{u x_{k}}+f_{u_{k}}^{T} V_{x x_{k+1}} f_{x_{k}}+V_{x_{k+1}} f_{u_{k} x_{k}} \end{aligned}
Qx=lxk+fxkTVxk+1Qu=luk+fukTVxk+1Qxx=lxxk+fxkTVxxk+1fxk+Vxk+1fxkxkQuu=luuk+fukTVxxk+1fuk+Vxk+1fukukQux=luxk+fukTVxxk+1fxk+Vxk+1fukxk
将
Q
(
δ
x
,
δ
u
)
Q(\delta x, \delta u)
Q(δx,δu)视为
δ
u
\delta u
δu的二次函数,令
∂
Q
(
δ
x
,
δ
u
)
δ
u
=
0
\frac{\partial Q(\delta x,\delta u)}{\delta u}=0
δu∂Q(δx,δu)=0,有:
∂
Q
(
δ
x
,
δ
u
)
δ
u
=
1
2
(
2
Q
u
u
k
δ
u
+
Q
u
x
k
δ
x
+
δ
x
T
Q
x
u
k
+
Q
u
k
+
Q
u
k
)
=
Q
u
u
k
δ
u
+
Q
u
x
k
δ
x
+
Q
u
k
=
0
\begin{aligned} \frac{\partial Q(\delta x, \delta u)}{\delta u} &= \frac{1}{2}(2Q_{\mathbf{uu}_k}\delta u+Q_{\mathbf{ux}_k}\delta x+\delta x^TQ_{\mathbf{xu}_k}+Q_{\mathbf{u}_k}+Q_{\mathbf{u} _k})\\ &=Q_{\mathbf{uu}_k}\delta u+Q_{\mathbf{ux}_k}\delta x+Q_{\mathbf{u}_k}\\ &=0 \end{aligned}
δu∂Q(δx,δu)=21(2Quukδu+Quxkδx+δxTQxuk+Quk+Quk)=Quukδu+Quxkδx+Quk=0
计算得:
δ
u
∗
=
−
Q
u
u
k
−
1
(
Q
u
k
+
Q
u
x
k
δ
x
)
\delta u^* = -Q_{\mathbf{uu}_k}^{-1}(Q_{\mathbf{u}_k}+Q_{\mathbf{ux}_k}\delta x)
δu∗=−Quuk−1(Quk+Quxkδx)
原理如下图:
令
k
=
−
Q
u
u
k
−
1
Q
u
k
,
K
=
−
Q
u
u
k
−
1
Q
u
x
k
k=-Q_{\mathbf{uu}_k}^{-1}Q_{\mathbf{u}_k},\ K=-Q_{\mathbf{uu}_k}^{-1}Q_{\mathbf{ux}_k}
k=−Quuk−1Quk, K=−Quuk−1Quxk,则
δ
u
∗
=
arg
min
δ
u
Q
(
δ
x
,
δ
u
)
=
k
+
K
δ
x
\delta u^{*}=\underset{\delta u}{\arg \min } Q(\delta x, \delta u)=k+K \delta x
δu∗=δuargminQ(δx,δu)=k+Kδx,将其带入
Q
(
δ
x
,
δ
u
)
Q(\delta x,\delta u)
Q(δx,δu),可以整理成如下形式:
Q
(
δ
x
,
δ
u
)
≈
Δ
V
+
V
x
k
T
δ
x
+
1
2
!
δ
x
T
V
x
x
k
δ
x
Q(\delta x,\delta u)\approx\Delta V+V_{\mathbf{x}_k}^T\delta{x}+\frac{1}{2!}\delta x^TV_{\mathbf{xx}_k}\delta x
Q(δx,δu)≈ΔV+VxkTδx+2!1δxTVxxkδx
对比可以得到:
Δ
V
=
−
1
2
Q
u
k
T
Q
u
u
k
−
1
Q
u
k
V
x
k
=
Q
x
k
−
Q
u
x
k
T
Q
u
u
k
−
1
Q
u
k
V
x
x
k
=
Q
x
x
k
−
Q
x
u
k
T
Q
u
u
k
−
1
Q
u
x
k
\begin{aligned} &\Delta V=-\frac{1}{2}Q_{\mathbf{u}_k}^TQ_{\mathbf{uu}_k}^{-1}Q_{\mathbf{u}_k}\\ &V_{\mathbf{x}_k}=Q_{\mathbf{x}_k}-Q_{\mathbf{ux}_k}^TQ_{\mathbf{uu}_k}^{-1}Q_{\mathbf{u}_k}\\ &V_{\mathbf{xx}_k}=Q_{\mathbf{xx}_k}-Q_{\mathbf{xu}_k}^TQ_{\mathbf{uu}_k}^{-1}Q_{\mathbf{ux}_k} \end{aligned}
ΔV=−21QukTQuuk−1QukVxk=Qxk−QuxkTQuuk−1QukVxxk=Qxxk−QxukTQuuk−1Quxk
其中
Q
u
u
k
Q_{\mathbf{uu}_k}
Quuk是半正定对称阵。
DDP伪代码:
- 由给定的控制序列
u
ˉ
k
\bar{u}_k
uˉk,正向得带计算标称轨迹。
x ˉ k + 1 = f ( x ˉ k , u ˉ k ) l x k , l u k , l x x k , l u x k , l u u k f x k , f u k , f x k x k , f u k u k , f u k x k \begin{aligned} &\bar{x}_{k+1}=f(\bar{x}_k,\bar{u}_k)\\ &l_{x_k},l_{u_k},l_{xx_k},l_{ux_k},l_{uu_k}\\ &f_{x_k},f_{u_k},f_{x_kx_k},f_{u_ku_k},f_{u_kx_k} \end{aligned} xˉk+1=f(xˉk,uˉk)lxk,luk,lxxk,luxk,luukfxk,fuk,fxkxk,fukuk,fukxk - 反向迭代:从
T
T
T到
1
1
1迭代
- 计算 V x k + 1 V_{x_{k+1}} Vxk+1, V x x k + 1 V_{xx_{k+1}} Vxxk+1
- 计算 Q Q Q函数
- 计算 δ u k ∗ \delta u_k^* δuk∗,得到 k k k_k kk, K k K_k Kk
- 正向迭代更新控制序列
x 1 = x ˉ ( 1 ) u k = u ˉ k + k k + K k ( x k − x ˉ k ) x k + 1 = f ( x k , u k ) \begin{aligned} &x_1 = \bar{x}(1)\\ &u_k=\bar{u}_k+k_k+K_k(x_k-\bar{x}_k)\\ &x_{k+1}=f(x_k,u_k) \end{aligned} x1=xˉ(1)uk=uˉk+kk+Kk(xk−xˉk)xk+1=f(xk,uk) - 是否收敛,否就跳转1,是就结束
开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!
更多推荐
所有评论(0)