faster rcnn代码解读（三）rpn_head

faster rcnn代码解读参考：https://github.com/adityaarun1/pytorch_fast-er_rcnnhttps://github.com/jwyang/faster-rcnn.pytorchrpn_head主要是为了从feature中分类出相应的background和foreground。class RpnHead(nn.Module...

shchojj

2437人浏览 · 2020-04-07 16:57:41

shchojj · 2020-04-07 16:57:41 发布

faster rcnn代码解读参考

：https://github.com/adityaarun1/pytorch_fast-er_rcnn

https://github.com/jwyang/faster-rcnn.pytorch

rpn_head主要是为了从feature中分类出相应的background和foreground。

class RpnHead(nn.Module):
    def __init__(self,in_channels=512):
        super(RpnHead, self).__init__()
        self.anchor_scales = cfg['anchors_scales']
        self.anchor_ratios = cfg['anchor_ratios']

        # define bg/fg classifcation score layer
        self.num_anchors = len(self.anchor_scales) * len(self.anchor_ratios)  # 每个像素有3*3个anchor
        # define bg/fg classifcation score layer
        self.nc_score_out = len(self.anchor_scales) * len(self.anchor_ratios) * 2  # 每个anchor被分成

        # define the convrelu layers processing input feature map
        self.rpn_net = nn.Conv2d(in_channels, in_channels, [3,3], padding=1)# 卷积层

        self.rpn_cls_score_net = nn.Conv2d( in_channels, self.num_anchors * 2, [1,1])# 卷积层为前景、背景分类
        self.rpn_bbox_pred_net = nn.Conv2d( in_channels, self.num_anchors * 4, [1, 1])# 卷积层为四个坐标回归


    @staticmethod
    def reshape(x, d):
        input_shape = x.size()
        x = x.view(
            input_shape[0],
            int(d),
            int(float(input_shape[1] * input_shape[2]) / float(d)),
            input_shape[3]
        )
        return x
    def forward(self, feature):
        # return feature map after convrelu layer
        rpn_conv1 = F.relu(self.rpn_net(feature), inplace=True)#将feature卷积

        # get rpn classification score [b, num_anchors * 2,feat_h,feat_w]
        rpn_cls_score = self.rpn_cls_score_net(rpn_conv1)#每个anchor卷积出2个分类得分
        # [b, 2, num_anchors * feat_h, feat_w]
        rpn_cls_score_reshape = self.reshape(rpn_cls_score, 2)#将前景背景分类得分reshape
        # 将reshape的前景、背景得分输入softmax
        rpn_cls_prob_reshape = F.softmax(rpn_cls_score_reshape, 1)# [b, 2, num_anchors * feat_h, feat_w]
        # Move channel to the last dimenstion, to fit the input of python functions
        # △将经过softmax得到的预测类别概率进行reshape成[b, num_anchors * 2,feat_h,feat_w] ->[batch, feat_h,feat_w ,(num_anchors * 2)]
        rpn_cls_prob = rpn_cls_prob_reshape.view_as(rpn_cls_score).permute(0, 2, 3, 1)

        #将rpn预测得分reshape成[batch, feat_h,feat_w ,(num_anchors * 2)]
        rpn_cls_score = rpn_cls_score.permute(0, 2, 3, 1)
        # [b, 2, num_anchors * feat_h, feat_w]->[batch, num_anchors * feat_h, feat_w ,2]
        rpn_cls_score_reshape = rpn_cls_score_reshape.permute( 0, 2, 3, 1).contiguous()
        
        batch = rpn_cls_score_reshape.shape[0]
        # [batch, num_anchors * feat_h* feat_w, 2]        #预测的类别，即得分最大的那一类
        rpn_cls_pred = torch.max(rpn_cls_score_reshape.view(batch,-1, 2), 2)[1]# 1表示返回批次中最大的索引

        # get rpn offsets to the anchor boxes
        rpn_bbox_pred = self.rpn_bbox_pred_net(rpn_conv1)#卷积成对应4坐标形式[b, num_anchors * 4,feat_h,feat_w]
        rpn_bbox_pred = rpn_bbox_pred.permute( 0, 2, 3, 1).contiguous()  # [batch,feat_h,feat_w, (num_anchors*4)]


        # print('rpn_head rpn_cls_score', rpn_cls_score.shape)
        # print('rpn_head rpn_cls_prob', rpn_cls_prob.shape)
        # print('rpn_head rpn_cls_pred', rpn_cls_pred.shape )
        # print('rpn_head rpn_bbox_pred', rpn_bbox_pred.shape)
        
        # rpn_cls_score:                          [batch, feat_h,                       feat_w ,(num_anchors * 2)]
        # rpn_cls_score_reshape:                  [batch, num_anchors* feat_h,          feat_w ,                2]
        # rpn_cls_prob = softmax(rpn_cls_score) : [batch, feat_h,                       feat_w ,(num_anchors * 2)]
        # rpn_cls_pred = max(rpn_cls_score) :     [batch, num_anchors * feat_h* feat_w, 2]
        # rpn_bbox_pred:                          [batch,feat_h,                        feat_w, (num_anchors*4)]
        return rpn_cls_score, rpn_cls_score_reshape, rpn_cls_prob, rpn_cls_pred, rpn_bbox_pred

num_anchors：表示单个像素anchor的个数

rpn_cls_score_net :表示单个像素background和foreground的分类得分，也就是用one-hot的多分类。

rpn_bbox_pred_net：四个坐标的回归

rpn_net：紧连的input的卷积层

reshape：这里用的是很多，要特别注意。

二、处理过程

先卷积conv+relu。
将卷积的结果用于二分类rpn_cls_score [b, num_anchors * 2,feat_h,feat_w]
利用reshape函数获得对应的单个像素的bg/fg得分。rpn_cls_score_reshape [b, 2, num_anchors * feat_h, feat_w]
计算reshape的结果通过激活函数softmax计算分类概率，为什么要是用softmax，可能是为了将分值范围压缩到(0,1)之间，并且让线性变换成为非线性变换。rpn_cls_prob_reshape[b, 2, num_anchors * feat_h, feat_w]
然后通过view_as将[b, 2, num_anchors * feat_h, feat_w]的rpn_cls_prob_reshape
变换成[b, num_anchors * 2,feat_h,feat_w]，这是在连续内存中操作的，然后permute。为什么这么变换，我们期望的rpn_cls_prob[batch, feat_h,feat_w ,(num_anchors * 2)]，虽然具体原理可能不是很清晰，但是目的是为了将单个像素所有的anchor进行fg/bg的二分类。
接下来的rpn_cls_score的permute就好理解了也是为了产生形式为[batch, feat_h,feat_w ,(num_anchors * 2)]主语rpn_cls_score的reshape位置中间还有一个rpn_cls_prob_reshape的view_as。
rpn_cls_score_reshape也就是分类得分会继续permute，[b, 2, num_anchors * feat_h, feat_w]->[batch, num_anchors * feat_h, feat_w ,2]，这里的目的主要是为了获得fg得分，后面会说到

rpn_cls_pred直接计算feature anchor的fg/bg分类得分。[batch, num_anchors * feat_h* feat_w, 2]

rpn_bbox_pred就比较容易理解，直接从feature回归出四个坐标。[b, num_anchors * 4,feat_h,feat_w]->

三、总结

输入：feature [batch, 3，feat_h,， feat_w]

输出：

rpn_cls_score:                          [batch, feat_h,                       feat_w ,(num_anchors * 2)]获取单个像素每个anchor为bg/fg的得分，没有进行softmax
rpn_cls_score_reshape:                  [batch, num_anchors* feat_h,          feat_w ,                2]这里返回主要是因为后续计算需要用到
rpn_cls_prob = softmax(rpn_cls_score) : [batch, feat_h,                       feat_w ,(num_anchors * 2)]#经过softmax的预测概率
rpn_cls_pred = max(rpn_cls_score) :     [batch, num_anchors * feat_h* feat_w, 2]#预测为bg还是fg的类别
rpn_bbox_pred:                          [batch,feat_h,                        feat_w, (num_anchors*4)]#用以boundingbox回归项。

从shape的形式上我们就可以看到，经过rpn_head，我们希望将feature上的所有anchor分类，得到对应的bounding box。分类有两个，一个是直接输出，一个是经过softmax。因此期望输出的格式都是最后一个维度为num_anchors*2|4.

rpn_cls_score_reshape实际上是可以用来计算分类loss的。

开放原子开发者工作坊

开放原子开发者工作坊旨在鼓励更多人参与开源活动，与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动，如meetup、训练营等，主打技术交流，干货满满，真诚地邀请各位开发者共同参与！

更多推荐

openEuler系累计装机量突破1000万，树立操作系统产业新里程碑

11月15日，以“以智能，致世界”为主题的操作系统大会2024在北京中关村国际创新中心召开。