轻量级网络——MobileNetV2

1.MobileNetV2的介绍MobileNet v2网络是由google团队在2018年提出的，相比MobileNet V1网络，准确率更高，模型更小。网络中的亮点：Inverted Residuals （倒残差结构）Linear Bottlenecks（结构的最后一层采用线性层）2.MobileNetV2的结构1）Inverted Residuals在之前的ResNet残差结构是先用1x

Clichong

43254人浏览 · 2021-05-16 16:45:01

Clichong · 2021-05-16 16:45:01 发布

文章目录

1.MobileNetV2的介绍

MobileNet v2网络是由google团队在2018年提出的，相比MobileNet V1网络，准确率更高，模型更小。

网络中的亮点：

Inverted Residuals （倒残差结构）
Linear Bottlenecks（结构的最后一层采用线性层）

2.MobileNetV2的结构

1）Inverted Residuals

在这里插入图片描述
在之前的ResNet残差结构是先用1x1的卷积降维，再升维的操作。而在MobileNetV2中，是先升维，在降维的操作。

所以对于ResNet残差结构是两头大，中间小。而对于MobileNetV2结构是中间大，两头小的结构。

其中，在MobileNet结构中，采用了新的激活函数：ReLU6
在这里插入图片描述

2）Linear Bottlenecks

针对倒残差结构中，最后一层的卷积层，采用了线性的激活函数，而不是ReLU激活函数。
在这里插入图片描述
一个解释是，ReLU激活函数对于低维的信息可能会造成比较大的瞬损失，而对于高维的特征信息造成的损失很小。而且由于倒残差结构是两头小中间大，所以输出的是一个低维的特征信息。所以使用一个线性的激活函数避免特征损失。

结构如下所示：
在这里插入图片描述
ps：当stride=1且输入特征矩阵与输出特征矩阵shape 相同时才有shortcut连接

shape的变化：其中的k是扩充因子
在这里插入图片描述

3.MobileNetV2的性能统计

Classification分类任务

其中MobileNetV2（1.4）中的1.4代表的是倍率因子也就是α，其中α是控制卷积层卷积核个数的超参数，β是控制输入图像的大小

可以看见，在CPU上分类一张图片主需要花75ms，基本上达到了实时性的要求。

Object Detection目标检测任务

可以看见，MobileNetV2的提出，已经基本上可以实现在移动设备或者是嵌入式设备来跑深度学习的模型了。将研究与日常生活结合了起来。

4.MobileNetV2的pytorch实现

MobileNetV2的网络结构
在这里插入图片描述
其中：

t是扩展因子，第一层1x1卷积层中卷积核的扩展倍率
c是输出特征矩阵深度channel
n是bottleneck的重复次数
s是步距（针对第一层，其他为1 ，与ResNet的类似，通过第一层的步长改变尺寸变化）

参考代码

import torch
import torch.nn as nn
import torchvision

# 分类个数
num_class = 5

# DW卷积
def Conv3x3BNReLU(in_channels,out_channels,stride,groups):
    return nn.Sequential(
            # stride=2 wh减半，stride=1 wh不变
            nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=3, stride=stride, padding=1, groups=groups),
            nn.BatchNorm2d(out_channels),
            nn.ReLU6(inplace=True)
        )

# PW卷积
def Conv1x1BNReLU(in_channels,out_channels):
    return nn.Sequential(
            nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=1),
            nn.BatchNorm2d(out_channels),
            nn.ReLU6(inplace=True)
        )

# # PW卷积(Linear) 没有使用激活函数
def Conv1x1BN(in_channels,out_channels):
    return nn.Sequential(
            nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=1),
            nn.BatchNorm2d(out_channels)
        )

class InvertedResidual(nn.Module):
    # t = expansion_factor,也就是扩展因子，文章中取6
    def __init__(self, in_channels, out_channels, expansion_factor, stride):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        self.in_channels = in_channels
        self.out_channels = out_channels
        mid_channels = (in_channels * expansion_factor)
        # print("expansion_factor:", expansion_factor)
        # print("mid_channels:",mid_channels)

        # 先1x1卷积升维，再1x1卷积降维
        self.bottleneck = nn.Sequential(
            # 升维操作: 扩充维度是 in_channels * expansion_factor (6倍)
            Conv1x1BNReLU(in_channels, mid_channels),
            # DW卷积,降低参数量
            Conv3x3BNReLU(mid_channels, mid_channels, stride, groups=mid_channels),
            # 降维操作: 降维度 in_channels * expansion_factor(6倍) 降维到指定 out_channels 维度
            Conv1x1BN(mid_channels, out_channels)
        )

        # 第一种: stride=1 才有shortcut 此方法让原本不相同的channels相同
        if self.stride == 1:
            self.shortcut = Conv1x1BN(in_channels, out_channels)

        # 第二种: stride=1 切 in_channels=out_channels 才有 shortcut
        # if self.stride == 1 and in_channels == out_channels:
        #     self.shortcut = ()

    def forward(self, x):
        out = self.bottleneck(x)
        # 第一种:
        out = (out+self.shortcut(x)) if self.stride==1 else out
        # 第二种:
        # out = (out + x) if self.stride == 1 and self.in_channels == self.out_channels else out
        return out

class MobileNetV2(nn.Module):
    # num_class为分类个数, t为扩充因子
    def __init__(self, num_classes=num_class, t=6):
        super(MobileNetV2,self).__init__()

        # 3 -> 32 groups=1 不是组卷积 单纯的卷积操作
        self.first_conv = Conv3x3BNReLU(3,32,2,groups=1)

        # 32 -> 16 stride=1 wh不变
        self.layer1 = self.make_layer(in_channels=32, out_channels=16, stride=1, factor=1, block_num=1)
        # 16 -> 24 stride=2 wh减半
        self.layer2 = self.make_layer(in_channels=16, out_channels=24, stride=2, factor=t, block_num=2)
        # 24 -> 32 stride=2 wh减半
        self.layer3 = self.make_layer(in_channels=24, out_channels=32, stride=2, factor=t, block_num=3)
        # 32 -> 64 stride=2 wh减半
        self.layer4 = self.make_layer(in_channels=32, out_channels=64, stride=2, factor=t, block_num=4)
        # 64 -> 96 stride=1 wh不变
        self.layer5 = self.make_layer(in_channels=64, out_channels=96, stride=1, factor=t, block_num=3)
        # 96 -> 160 stride=2 wh减半
        self.layer6 = self.make_layer(in_channels=96, out_channels=160, stride=2, factor=t, block_num=3)
        # 160 -> 320 stride=1 wh不变
        self.layer7 = self.make_layer(in_channels=160, out_channels=320, stride=1, factor=t, block_num=1)
        # 320 -> 1280 单纯的升维操作
        self.last_conv = Conv1x1BNReLU(320,1280)

        self.avgpool = nn.AvgPool2d(kernel_size=7,stride=1)
        self.dropout = nn.Dropout(p=0.2)
        self.linear = nn.Linear(in_features=1280,out_features=num_classes)
        self.init_params()

    def make_layer(self, in_channels, out_channels, stride, factor, block_num):
        layers = []
        # 与ResNet类似，每层Bottleneck单独处理，指定stride。此层外的stride均为1
        layers.append(InvertedResidual(in_channels, out_channels, factor, stride))
        # 这些叠加层stride均为1，in_channels = out_channels, 其中 block_num-1 为重复次数
        for i in range(1, block_num):
            layers.append(InvertedResidual(out_channels, out_channels, factor, 1))
        return nn.Sequential(*layers)

    # 初始化权重操作
    def init_params(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear) or isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

    def forward(self, x):
        x = self.first_conv(x)  # torch.Size([1, 32, 112, 112])
        x = self.layer1(x)      # torch.Size([1, 16, 112, 112])
        x = self.layer2(x)      # torch.Size([1, 24, 56, 56])
        x = self.layer3(x)      # torch.Size([1, 32, 28, 28])
        x = self.layer4(x)      # torch.Size([1, 64, 14, 14])
        x = self.layer5(x)      # torch.Size([1, 96, 14, 14])
        x = self.layer6(x)      # torch.Size([1, 160, 7, 7])
        x = self.layer7(x)      # torch.Size([1, 320, 7, 7])
        x = self.last_conv(x)   # torch.Size([1, 1280, 7, 7])
        x = self.avgpool(x)     # torch.Size([1, 1280, 1, 1])
        x = x.view(x.size(0),-1)    # torch.Size([1, 1280])
        x = self.dropout(x)
        x = self.linear(x)      # torch.Size([1, 5])
        return x


if __name__=='__main__':
    model = MobileNetV2()
    # model = torchvision.models.MobileNetV2()
    # print(model)

    input = torch.randn(1, 3, 224, 224)
    out = model(input)
    print(out.shape)