微分动态规划(Differential Dynamic Programming, DDP)

变分法思路

由贝尔曼最优性原理得到:
V ( X , t ) = min ⁡ u ∈ Ω { ϕ [ X ( t f ) , t f ] + ∫ t 0 t f L ( x ( t ) , u ( t ) , t ) d t } = min ⁡ u ∈ Ω { ∫ t 0 t 0 + d t L ( x ( τ ) , u ( τ ) , τ ) d τ + V ( X + Δ X , t + d t ) } \begin{aligned} V(X, t)&=\min _{u \in \Omega}\left\{\phi\left[X\left(t_{f}\right), t_{f}\right]+\int_{t_{0}}^{t_{f}} L(x(t), u(t), t) d t\right\}\\ &=\min _{u \in \Omega}\left\{\int_{t_{0}}^{t_{0}+d t} L(x(\tau), u(\tau), \tau) d \tau+V(X+\Delta X, t+d t)\right\} \end{aligned} V(X,t)=uΩmin{ϕ[X(tf),tf]+t0tfL(x(t),u(t),t)dt}=uΩmin{t0t0+dtL(x(τ),u(τ),τ)dτ+V(X+ΔX,t+dt)}
离散化得:
x k + 1 = f ( x k , u k ) V k = min ⁡ u [ l ( x k , u k ) + V k + 1 ( x k + 1 ) ] \begin{array}{l}{x_{k+1}=f\left(x_{k}, u_{k}\right)} \\ \\{V_{k}=\min _{u}\left[l\left(x_{k},u_{k}\right)+V_{k+1}\left(x_{k+1}\right)\right]}\end{array} xk+1=f(xk,uk)Vk=minu[l(xkuk)+Vk+1(xk+1)]
k k k时刻的值函数 V ( x , k ) V(x,k) V(x,k)做变分,令 Q ( δ x , δ u ) = V k ( x + δ x ) − V k ( x ) Q(\delta x,\delta u)=V_k(x+\delta x)-V_k(x) Q(δx,δu)=Vk(x+δx)Vk(x),对 Q Q Q在标称轨迹 ( x k , u k ) (x_k,u_k) (xk,uk)进行二阶泰勒展开:
Q ( δ x , δ u ) = V ( x + δ x ) − V ( x ) = l ( x k + δ x k , u k + δ u k ) + V k + 1 ( x k + 1 + δ x k + 1 ) − ( l ( x k , u k ) + V k + 1 ( x k + 1 ) ) ≈ δ x k T l x k + δ u k T l u k + 1 2 ( δ x k T l x x k δ x k + 2 δ x k T l x u k δ u k + δ u k T l u u k δ u k ) + δ x k + 1 T V x k + 1 + 1 2 δ x k + 1 T V x x k + 1 δ x k + 1 \begin{aligned} Q(\delta x, \delta u)&=V(x+\delta x)-V(x)\\ &=l\left(x_{k}+\delta x_{k}, u_{k}+\delta u_{k}\right)+V_{k+1}\left(x_{k+1}+\delta x_{k+1}\right)-\left(l\left(x_{k}, u_{k}\right)+V_{k+1}\left(x_{k+1}\right)\right)\\ &\approx \delta x_{k}^{T} l_{x_{k}}+\delta u_{k}^{T} l_{u_{k}}+\frac{1}{2}\left(\delta x_{k}^{T} l_{x x_{k}} \delta x_{k}+2 \delta x_{k}^{T} l_{x u_{k}} \delta u_{k}+\delta u_{k}^{T} l_{u u_{k}} \delta u_{k}\right)+\\&\qquad \qquad\qquad\qquad\quad\delta x_{k+1}^{T} V_{x_{k+1}}+\frac{1}{2} \delta x_{k+1}^{T} V_{x x_{k+1}} \delta x_{k+1} \end{aligned} Q(δx,δu)=V(x+δx)V(x)=l(xk+δxk,uk+δuk)+Vk+1(xk+1+δxk+1)(l(xk,uk)+Vk+1(xk+1))δxkTlxk+δukTluk+21(δxkTlxxkδxk+2δxkTlxukδuk+δukTluukδuk)+δxk+1TVxk+1+21δxk+1TVxxk+1δxk+1
又有 x k + 1 = f ( x k , u k ) x_{k+1}=f(x_k,u_k) xk+1=f(xk,uk),二阶泰勒展开得到:
δ x k + 1 = δ f ( x k , u k ) = f x k δ x k + f u k δ u k + 1 2 ( δ x k T f x x k δ x + 2 δ x k T f x u k δ u k + δ u k T f u u k δ u ) \delta x_{k+1}=\delta f\left(x_{k}, u_{k}\right)=f_{x_{k}} \delta x_{k}+f_{u_{k}} \delta u_{k}+\frac{1}{2}\left(\delta x_{k}^{T} f_{x x_{k}} \delta x+2 \delta x_{k}^{T} f_{x u_{k}} \delta u_{k}+\delta u_{k}^{T} f_{u u_{k}} \delta u\right) δxk+1=δf(xk,uk)=fxkδxk+fukδuk+21(δxkTfxxkδx+2δxkTfxukδuk+δukTfuukδu)
δ x k + 1 \delta x_{k+1} δxk+1带入 Q ( δ x , δ u ) Q(\delta x,\delta u) Q(δx,δu)中,得到:
Q ( δ x , δ u ) ≈ 1 2 [ 1 δ x k δ u k ] T [ 0 Q x k T Q u k T Q x k Q x x k Q x u k T Q u k Q u x k Q u u k ] [ 1 δ x k δ u k ] Q(\delta x,\delta u) \approx\frac{1}{2}\left[\begin{array}{c}{1} \\ {\delta x_k} \\ {\delta u_k}\end{array}\right]^T\left[\begin{array}{ccc}{0} & {Q_{\mathrm{x}_k}^{T}} & {Q_{\mathrm{u}_k}^{T}} \\ {Q_{\mathrm{x}_k}} & {Q_{\mathrm{xx}_k}} & {Q_{\mathrm{xu}_k}^T} \\ {Q_{\mathrm{u}_k}} & {Q_{\mathrm{ux}_k}} & {Q_{\mathrm{uu}_k}}\end{array}\right]\left[\begin{array}{c}{1} \\ {\delta x_k} \\ {\delta u_k}\end{array}\right] Q(δx,δu)211δxkδukT0QxkQukQxkTQxxkQuxkQukTQxukTQuuk1δxkδuk
其中:
Q x = l x k + f x k T V x k + 1 Q u = l u k + f u k T V x k + 1 Q x x = l x x k + f x k T V x x k + 1 f x k + V x k + 1 f x k x k Q u u = l u u k + f u k T V x x k + 1 f u k + V x k + 1 f u k u k Q u x = l u x k + f u k T V x x k + 1 f x k + V x k + 1 f u k x k \begin{aligned} &Q_{x} =l_{x_{k}}+f_{x_{k}}^{T} V_{x_{k+1}} \\ &Q_{u} =l_{u_{k}}+f_{u_{k}}^{T} V_{x_{k+1}} \\ &Q_{x x} =l_{x x_{k}}+f_{x_{k}}^{T} V_{x x_{k+1}} f_{x_{k}}+V_{x_{k+1}} f_{x_{k} x_{k}} \\ &Q_{u u} =l_{u u_{k}}+f_{u_{k}}^{T} V_{x x_{k+1}} f_{u_{k}}+V_{x_{k+1}} f_{u_{k} u_{k}} \\ &Q_{u x} =l_{u x_{k}}+f_{u_{k}}^{T} V_{x x_{k+1}} f_{x_{k}}+V_{x_{k+1}} f_{u_{k} x_{k}} \end{aligned} Qx=lxk+fxkTVxk+1Qu=luk+fukTVxk+1Qxx=lxxk+fxkTVxxk+1fxk+Vxk+1fxkxkQuu=luuk+fukTVxxk+1fuk+Vxk+1fukukQux=luxk+fukTVxxk+1fxk+Vxk+1fukxk
Q ( δ x , δ u ) Q(\delta x, \delta u) Q(δx,δu)视为 δ u \delta u δu的二次函数,令 ∂ Q ( δ x , δ u ) δ u = 0 \frac{\partial Q(\delta x,\delta u)}{\delta u}=0 δuQ(δx,δu)=0,有:
∂ Q ( δ x , δ u ) δ u = 1 2 ( 2 Q u u k δ u + Q u x k δ x + δ x T Q x u k + Q u k + Q u k ) = Q u u k δ u + Q u x k δ x + Q u k = 0 \begin{aligned} \frac{\partial Q(\delta x, \delta u)}{\delta u} &= \frac{1}{2}(2Q_{\mathbf{uu}_k}\delta u+Q_{\mathbf{ux}_k}\delta x+\delta x^TQ_{\mathbf{xu}_k}+Q_{\mathbf{u}_k}+Q_{\mathbf{u} _k})\\ &=Q_{\mathbf{uu}_k}\delta u+Q_{\mathbf{ux}_k}\delta x+Q_{\mathbf{u}_k}\\ &=0 \end{aligned} δuQ(δx,δu)=21(2Quukδu+Quxkδx+δxTQxuk+Quk+Quk)=Quukδu+Quxkδx+Quk=0
计算得: δ u ∗ = − Q u u k − 1 ( Q u k + Q u x k δ x ) \delta u^* = -Q_{\mathbf{uu}_k}^{-1}(Q_{\mathbf{u}_k}+Q_{\mathbf{ux}_k}\delta x) δu=Quuk1(Quk+Quxkδx)
原理如下图:
在这里插入图片描述
k = − Q u u k − 1 Q u k ,   K = − Q u u k − 1 Q u x k k=-Q_{\mathbf{uu}_k}^{-1}Q_{\mathbf{u}_k},\ K=-Q_{\mathbf{uu}_k}^{-1}Q_{\mathbf{ux}_k} k=Quuk1Quk, K=Quuk1Quxk,则 δ u ∗ = arg ⁡ min ⁡ δ u Q ( δ x , δ u ) = k + K δ x \delta u^{*}=\underset{\delta u}{\arg \min } Q(\delta x, \delta u)=k+K \delta x δu=δuargminQ(δx,δu)=k+Kδx,将其带入 Q ( δ x , δ u ) Q(\delta x,\delta u) Q(δx,δu),可以整理成如下形式:
Q ( δ x , δ u ) ≈ Δ V + V x k T δ x + 1 2 ! δ x T V x x k δ x Q(\delta x,\delta u)\approx\Delta V+V_{\mathbf{x}_k}^T\delta{x}+\frac{1}{2!}\delta x^TV_{\mathbf{xx}_k}\delta x Q(δx,δu)ΔV+VxkTδx+2!1δxTVxxkδx
对比可以得到:
Δ V = − 1 2 Q u k T Q u u k − 1 Q u k V x k = Q x k − Q u x k T Q u u k − 1 Q u k V x x k = Q x x k − Q x u k T Q u u k − 1 Q u x k \begin{aligned} &\Delta V=-\frac{1}{2}Q_{\mathbf{u}_k}^TQ_{\mathbf{uu}_k}^{-1}Q_{\mathbf{u}_k}\\ &V_{\mathbf{x}_k}=Q_{\mathbf{x}_k}-Q_{\mathbf{ux}_k}^TQ_{\mathbf{uu}_k}^{-1}Q_{\mathbf{u}_k}\\ &V_{\mathbf{xx}_k}=Q_{\mathbf{xx}_k}-Q_{\mathbf{xu}_k}^TQ_{\mathbf{uu}_k}^{-1}Q_{\mathbf{ux}_k} \end{aligned} ΔV=21QukTQuuk1QukVxk=QxkQuxkTQuuk1QukVxxk=QxxkQxukTQuuk1Quxk
其中 Q u u k Q_{\mathbf{uu}_k} Quuk是半正定对称阵。

DDP伪代码:

  1. 由给定的控制序列 u ˉ k \bar{u}_k uˉk,正向得带计算标称轨迹。
    x ˉ k + 1 = f ( x ˉ k , u ˉ k ) l x k , l u k , l x x k , l u x k , l u u k f x k , f u k , f x k x k , f u k u k , f u k x k \begin{aligned} &\bar{x}_{k+1}=f(\bar{x}_k,\bar{u}_k)\\ &l_{x_k},l_{u_k},l_{xx_k},l_{ux_k},l_{uu_k}\\ &f_{x_k},f_{u_k},f_{x_kx_k},f_{u_ku_k},f_{u_kx_k} \end{aligned} xˉk+1=f(xˉk,uˉk)lxk,luk,lxxk,luxk,luukfxk,fuk,fxkxk,fukuk,fukxk
  2. 反向迭代:从 T T T 1 1 1迭代
    • 计算 V x k + 1 V_{x_{k+1}} Vxk+1 V x x k + 1 V_{xx_{k+1}} Vxxk+1
    • 计算 Q Q Q函数
    • 计算 δ u k ∗ \delta u_k^* δuk,得到 k k k_k kk K k K_k Kk
  3. 正向迭代更新控制序列
    x 1 = x ˉ ( 1 ) u k = u ˉ k + k k + K k ( x k − x ˉ k ) x k + 1 = f ( x k , u k ) \begin{aligned} &x_1 = \bar{x}(1)\\ &u_k=\bar{u}_k+k_k+K_k(x_k-\bar{x}_k)\\ &x_{k+1}=f(x_k,u_k) \end{aligned} x1=xˉ(1)uk=uˉk+kk+Kk(xkxˉk)xk+1=f(xk,uk)
  4. 是否收敛,否就跳转1,是就结束
Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐