点击上方,选择星标置顶,不定期资源大放送

阅读大概需要15分钟

Follow小博主,每天更新前沿干货

【导读】2020年,在各大CV顶会上又出现了许多基于ResNet改进的工作,比如:Res2Net,ResNeSt,IResNet,SCNet等等。为了更好的了解ResNet整个体系脉络的发展,我们开设了一个最强ResNet改进系列专题,主要为大家介绍2020年最新发表在顶会顶刊上基于ResNet改进的论文,这些论文的创新点很值得参考借鉴!本文是【最强ResNet改进系列】第一篇文章,本文我们将着重讲解Res2Net,该论文已被TPAMI2020录用,另外ResNeSt的论文解读见:【CV中的注意力机制】史上最强"ResNet"变体--ResNeSt,下一篇我们将直接来讲解IResNet

Res2Net

  • 论文链接:https://arxiv.org/pdf/1904.01169.pdf

  • 代码地址:https://github.com/Res2Net/Res2Net-PretrainedModels

论文摘要与创新点

在诸多视觉任务中,提取多尺度特征非常重要。backbone卷积神经网络(CNN)的最新进展中,也在不断增加更强大的多尺度特征表示能力,从而在更广泛的应用中实现了一致的性能提升。然而,大多数现有的网络架构都是在一层一层的基础上使用了多尺度。在本文中,我们提出了一个简单而有效的多尺度处理方法。与大多数现有的在神经网络通过分层来表示多尺度特征学习的方法不同,我们在更细粒度的层次上提高了网络神经网络的多尺度表示能力。作者提出了一种新颖的CNN模块,叫作Res2Net,在单个残差块内构造具有等级制的类似残差连接,取代了通用的单个3x3卷积核。Res2Net在更细粒度级别表示多尺度特征,并增加了每个网络层的感受野。可以将Res2Net模块插入最新的主干CNN模型中,例如ResNet,ResNeXt和DLA。我们在这些模型上评估Res2Net块,并在广泛使用的数据集(例如CIFAR-100和ImageNet)上进行了实验,该方法性能优于原先基准模型。在一些具有代表性的计算机视觉任务,进行了大量的消融实验,结果展示,Res2Net的模型效果超越了目前最先进的基准模型。


创新点:

  1.  不同于之前的网络结构利用不同分辨率的特征来提高多尺度能力,我们提出的多尺度方法可以在更细粒度级别上表示多尺度特征,并增加每个网络的感受野。

  2. 我们用一组更小的滤波器组替换了n个通道的3×3卷积核,每个都是w个通道(我们使用n = s×w不失一般性)。如图2所示,这些较小的滤波器组以类残差的层次化方式连接,以增加输出特征所能代表的尺度的数量。

  3. Res2Net策略引出了一个新的维度,即scale (Res2Net块中特征组的数量),作为现有网络中的深度、宽度和基数维度之外的一个重要因素。文中实验证明,增加Scale比增加其他维度更有效。请注意,所提出的方法在更细粒度的层次上利用了多尺度的潜力,这与利用分层操作的现有方法是正交的。因此,所提出的构建块,即Res2Net模块,可以很容易地插入到许多现有的CNN架构中。

模块介绍:

上图a是ResNet网络,图b是Res2Net,可以看出后者明显在残差单元(residual block)中插入更多带层级的残差连接结构(hierarchical residual-like connections)。具体来说,我们将一组3×3的卷积核替换为更小的一组过滤器,同时以分层的类残差方式来连接不同的过滤器组。

在第一个1x1卷积后,将输入划分到s个子集,定义为  ,每一个特征都有相同的尺度大小,但通道是输入特征的1/s,除了  其他的子特征都有相应的3x3卷积核,定义为  ,其输出为  。子特征  都和  相加,然后输入到  。为了在增加s时还减小参数,省略了  的3x3卷积。因此  可以写作:

 

每一个3x3的卷积操作都可以潜在的接受所有其左边的特征信息, 每一个输出都能增大感受野,所以每一个Res2Net都能获取不同数量和不同感受野大小的特征组合。这里的s作为scale维的控制参数(control parameter),更大的s就允许更多的特征有更丰富的感受野大小被学习。

在Res2Net块中,一个单独残差块中的分层的残差连接使感受野在更细粒度级别上的变化能够捕获细节和全局特性。实验结果表明,Res2Net模块可以与网络设计相结合,进一步提高网络性能。

与其他模块结合:

在不同计算机视觉任务上的结果展示:

基于PyTorch的Res2Net代码实现

import torch.nn as nn
import math
import torch.utils.model_zoo as model_zoo
import torch
import torch.nn.functional as F
__all__ = ['Res2Net', 'res2net50']


model_urls = {
    'res2net50_26w_4s': 'https://shanghuagao.oss-cn-beijing.aliyuncs.com/res2net/res2net50_26w_4s-06e79181.pth',
    'res2net50_48w_2s': 'https://shanghuagao.oss-cn-beijing.aliyuncs.com/res2net/res2net50_48w_2s-afed724a.pth',
    'res2net50_14w_8s': 'https://shanghuagao.oss-cn-beijing.aliyuncs.com/res2net/res2net50_14w_8s-6527dddc.pth',
    'res2net50_26w_6s': 'https://shanghuagao.oss-cn-beijing.aliyuncs.com/res2net/res2net50_26w_6s-19041792.pth',
    'res2net50_26w_8s': 'https://shanghuagao.oss-cn-beijing.aliyuncs.com/res2net/res2net50_26w_8s-2c7c9f12.pth',
    'res2net101_26w_4s': 'https://shanghuagao.oss-cn-beijing.aliyuncs.com/res2net/res2net101_26w_4s-02a759a1.pth',
}


class Bottle2neck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None, baseWidth=26, scale = 4, stype='normal'):
        """ Constructor
        Args:
            inplanes: input channel dimensionality
            planes: output channel dimensionality
            stride: conv stride. Replaces pooling layer.
            downsample: None when stride = 1
            baseWidth: basic width of conv3x3
            scale: number of scale.
            type: 'normal': normal set. 'stage': first block of a new stage.
        """
        super(Bottle2neck, self).__init__()

        width = int(math.floor(planes * (baseWidth/64.0)))
        self.conv1 = nn.Conv2d(inplanes, width*scale, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(width*scale)
        
        if scale == 1:
          self.nums = 1
        else:
          self.nums = scale -1
        if stype == 'stage':
            self.pool = nn.AvgPool2d(kernel_size=3, stride = stride, padding=1)
        convs = []
        bns = []
        for i in range(self.nums):
          convs.append(nn.Conv2d(width, width, kernel_size=3, stride = stride, padding=1, bias=False))
          bns.append(nn.BatchNorm2d(width))
        self.convs = nn.ModuleList(convs)
        self.bns = nn.ModuleList(bns)

        self.conv3 = nn.Conv2d(width*scale, planes * self.expansion, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * self.expansion)

        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stype = stype
        self.scale = scale
        self.width = width

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        spx = torch.split(out, self.width, 1)
        for i in range(self.nums):
          if i==0 or self.stype=='stage':
            sp = spx[i]
          else:
            sp = sp + spx[i]
          sp = self.convs[i](sp)
          sp = self.relu(self.bns[i](sp))
          if i==0:
            out = sp
          else:
            out = torch.cat((out, sp), 1)
        if self.scale != 1 and self.stype=='normal':
          out = torch.cat((out, spx[self.nums]),1)
        elif self.scale != 1 and self.stype=='stage':
          out = torch.cat((out, self.pool(spx[self.nums])),1)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out

class Res2Net(nn.Module):

    def __init__(self, block, layers, baseWidth = 26, scale = 4, num_classes=1000):
        self.inplanes = 64
        super(Res2Net, self).__init__()
        self.baseWidth = baseWidth
        self.scale = scale
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

    def _make_layer(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes * block.expansion,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample=downsample,
                        stype='stage', baseWidth = self.baseWidth, scale=self.scale))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes, baseWidth = self.baseWidth, scale=self.scale))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)

        return x


def res2net50(pretrained=False, **kwargs):
    """Constructs a Res2Net-50 model.
    Res2Net-50 refers to the Res2Net-50_26w_4s.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = Res2Net(Bottle2neck, [3, 4, 6, 3], baseWidth = 26, scale = 4, **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(model_urls['res2net50_26w_4s']))
    return model

def res2net50_26w_4s(pretrained=False, **kwargs):
    """Constructs a Res2Net-50_26w_4s model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = Res2Net(Bottle2neck, [3, 4, 6, 3], baseWidth = 26, scale = 4, **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(model_urls['res2net50_26w_4s']))
    return model

def res2net101_26w_4s(pretrained=False, **kwargs):
    """Constructs a Res2Net-50_26w_4s model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = Res2Net(Bottle2neck, [3, 4, 23, 3], baseWidth = 26, scale = 4, **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(model_urls['res2net101_26w_4s']))
    return model

def res2net50_26w_6s(pretrained=False, **kwargs):
    """Constructs a Res2Net-50_26w_4s model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = Res2Net(Bottle2neck, [3, 4, 6, 3], baseWidth = 26, scale = 6, **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(model_urls['res2net50_26w_6s']))
    return model

def res2net50_26w_8s(pretrained=False, **kwargs):
    """Constructs a Res2Net-50_26w_4s model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = Res2Net(Bottle2neck, [3, 4, 6, 3], baseWidth = 26, scale = 8, **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(model_urls['res2net50_26w_8s']))
    return model

def res2net50_48w_2s(pretrained=False, **kwargs):
    """Constructs a Res2Net-50_48w_2s model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = Res2Net(Bottle2neck, [3, 4, 6, 3], baseWidth = 48, scale = 2, **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(model_urls['res2net50_48w_2s']))
    return model

def res2net50_14w_8s(pretrained=False, **kwargs):
    """Constructs a Res2Net-50_14w_8s model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = Res2Net(Bottle2neck, [3, 4, 6, 3], baseWidth = 14, scale = 8, **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(model_urls['res2net50_14w_8s']))
    return model



if __name__ == '__main__':
    images = torch.rand(1, 3, 224, 224).cuda(0)
    model = res2net101_26w_4s(pretrained=True)
    model = model.cuda(0)
    print(model(images).size())

推荐阅读:

重磅!DLer-计算机视觉交流群已成立!

欢迎各位Cver加入计算机视觉微信交流大群,本群旨在交流图像分类、目标检测、点云/语义分割、目标跟踪、机器视觉、GAN、超分辨率、人脸检测与识别、动作行为/时空/光流/姿态/运动、模型压缩/量化/剪枝、NAS、迁移学习、人体姿态估计等内容。更有真实项目需求对接、求职内推、算法竞赛、干货资讯汇总、行业技术交流等,欢迎加群交流学习!

进群请备注:研究方向+学校/公司+昵称(如图像分类+上交+小明)

广告商、博主请绕道!

???? 长按识别添加,邀请您进群!

Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐