卷积(CNN)

目的:提特征

CNN架构

CNN

池化(Pooling)

Dropout

权值共享

池化目的:

  1. 降采样(subsample),减小参数(防止过拟合)
  2. 减少输入图片大小也使得神经网络可以经受一点图片平移,不受位置的影响
  3. 大小步长padding类型
  4. 池化神经元没有权重值,只有取最大或者求平均

为什么卷积层的W参数少

  1. 局部连接
  2. 权值共享

Reverse-mode Autodiff

  1. 因为n7是输出节点,所以f=n7,所以 α f α n 7 = 1 \frac{\alpha_f}{\alpha_n7}=1 αn7αf=1
  2. n5节点, α f α n 5 = α f α n 7 ∗ α n 7 α n 5 = 1 ∗ α n 7 α n 5 \frac{\alpha_f}{\alpha_n5}=\frac{\alpha_f}{\alpha_n7}*\frac{\alpha_n7}{\alpha_n5}=1*\frac{\alpha_n7}{\alpha_n5} αn5αf=αn7αfαn5αn7=1αn5αn7,因为n7 = n5 + n6,所以 α n 7 α n 5 = 1 \frac{\alpha_n7}{\alpha_n5}=1 αn5αn7=1
    所以 α f α n 5 = 1 ∗ 1 = 1 \frac{\alpha_f}{\alpha_n5}=1*1=1 αn5αf=11=1
  3. n4节点, α f α n 4 = α f α n 5 ∗ α n 5 α n 4 = 1 ∗ α n 5 α n 4 \frac{\alpha_f}{\alpha_n4}=\frac{\alpha_f}{\alpha_n5}*\frac{\alpha_n5}{\alpha_n4}=1*\frac{\alpha_n5}{\alpha_n4} αn4αf=αn5αfαn4αn5=1αn4αn5,因为n5 = n4 * n2,所以 α f α n 4 = 1 ∗ n 2 = 4 \frac{\alpha_f}{\alpha_n4}=1*n2=4 αn4αf=1n2=4
  4. α f α x = α f α n 4 ∗ α n 4 α x = 4 ∗ 2 ∗ n 1 = 24 \frac{\alpha_f}{\alpha_x}=\frac{\alpha_f}{\alpha_n4}*\frac{\alpha_n4}{\alpha_x}=4*2*n1=24 αxαf=αn4αfαxαn4=42n1=24
  5. α f α y = α f α n 5 ∗ α n 5 α n 2 + α f α n 6 ∗ α n 6 α n 2 = 1 ∗ n 4 + 1 ∗ 1 = 9 + 1 = 10 \frac{\alpha_f}{\alpha_y}=\frac{\alpha_f}{\alpha_n5}*\frac{\alpha_n5}{\alpha_n2} + \frac{\alpha_f}{\alpha_n6}*\frac{\alpha_n6}{\alpha_n2}=1*n4 + 1*1 = 9 + 1 =10 αyαf=αn5αfαn2αn5+αn6αfαn2αn6=1n4+11=9+1=10
    链式求导法则

反向传播(Backpropagation)

  1. y=1.00
  2. w − 1 = − 0.53 = y ∗ d 1 x = 1.00 ∗ − 1 x 2 = 1.00 ∗ − 1 1.3 7 2 w_{-1} =-0.53=y*d{\frac{1}{x}}=1.00*-\frac{1}{x^2}=1.00*-\frac{1}{1.37^2} w1=0.53=ydx1=1.00x21=1.001.3721,这里的x=1.37
  3. w − 2 = − 0.53 = w − 1 ∗ d x + 1 = − 0.53 ∗ 1 = − 0.53 w_{-2} =-0.53=w_{-1}*d{x+1}=-0.53*1=-0.53 w2=0.53=w1dx+1=0.531=0.53,这里的x=0.37
  4. w − 3 = − 0.20 = w − 2 ∗ d e x = − 0.53 ∗ − ( 1 / e ) w_{-3}=-0.20=w_{-2}*d{e^x}=-0.53*-(1/e) w3=0.20=w2dex=0.53(1/e),这里的x=-1.00
  5. w − 4 = 0.20 = w − 3 ∗ d x ∗ ( − 1 ) = 0.2 w_{-4}=0.20=w_{-3}*dx*(-1)=0.2 w4=0.20=w3dx(1)=0.2,这里的x=1.00,不过不需要知道是多少,也可以求导
  6. w − 5 a = w − 4 ∗ d x a ( x a + x b ) = w − 4 ∗ 1 = 0.20 w_{-5a}=w_{-4}*d^{x_a}(x_a+x_b)=w_{-4}*1=0.20 w5a=w4dxa(xa+xb)=w41=0.20
  7. w − 5 b = w − 4 ∗ d x b ( x a + x b ) = w − 4 ∗ 1 = 0.20 w_{-5b}=w_{-4}*d^{x_b}(x_a+x_b)=w_{-4}*1=0.20 w5b=w4dxb(xa+xb)=w41=0.20
  8. w − 6 a 1 = w − 5 a ∗ d x a 1 ( x a 1 + x a 2 ) = w − 5 a = 0.20 w_{-6a_1}=w_{-5a}*d^{x_a1}(x_a1+x_a2)=w_{-5a}=0.20 w6a1=w5adxa1(xa1+xa2)=w5a=0.20
  9. w − 6 a 2 = w − 5 a ∗ d x a 2 ( x a 1 + x a 2 ) = w − 5 a = 0.20 w_{-6a_2}=w_{-5a}*d^{x_a2}(x_a1+x_a2)=w_{-5a}=0.20 w6a2=w5adxa2(xa1+xa2)=w5a=0.20
  10. w − 7 a 11 = w − 6 a 1 ∗ d w 0 ( w 0 ∗ x 0 ) = − 0.20 w_{-7a_{11}}=w_{-6a_1}*d^{w_0}(w0*x0)=-0.20 w7a11=w6a1dw0(w0x0)=0.20,这里的 x0=-1.00
  11. w − 7 a 12 = w − 6 a 1 ∗ d x 0 ( w 0 ∗ x 0 ) = 0.4 w_{-7a_{12}}=w_{-6a_1}*d^{x0}(w0*x0)=0.4 w7a12=w6a1dx0(w0x0)=0.4,这里的w0=2.00
  12. w − 7 a 21 = w − 6 a 1 ∗ d w 1 ( w 1 ∗ x 1 ) = − 0.40 w_{-7a_{21}}=w_{-6a_1}*d^{w1}(w1*x1)=-0.40 w7a21=w6a1dw1(w1x1)=0.40,这里的x1=-2.00
  13. w − 7 22 = w − 6 a 1 ∗ d x 1 ( w 1 ∗ x 1 ) = − 0.60 w_{-7_{22}}=w_{-6a_1}*d^{x1}(w1*x1)=-0.60 w722=w6a1dx1(w1x1)=0.60,这里的w1=-3.00
    backpropagation

VGG16

待补充

Sigmode函数

f ( x ) = 1 1 + e − x f_{(x)}=\frac{1}{1+e^{-x}} f(x)=1+ex1
图像

Tanh函数

f ( x ) = e x − e − x e x + e − x f_{(x)}=\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}} f(x)=ex+exexex
图像

Tanh函数的导数

f ( x ) / = 1 − [ f ( x ) ] 2 f^/_{(x)}=1-[f_{(x)}]^{2} f(x)/=1[f(x)]2

Relu函数

有缺点

梯度消失

梯度爆炸(弥散)

正则化

归一化

EalyStopping

防止过拟合

Batch Normalization(批量归一化)

目的:
1.减轻梯度弥散
2. 对权重初始化不那么敏感
3. 可以使用大一点的学习率来提速
4. 是一种正则化,提高泛化能力,就没必要使用Dropout了

Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐