faster rcnn代码解读参考

https://github.com/adityaarun1/pytorch_fast-er_rcnn

    https://github.com/jwyang/faster-rcnn.pytorch

rpn_head主要是为了从feature中分类出相应的background和foreground。

class RpnHead(nn.Module):
    def __init__(self,in_channels=512):
        super(RpnHead, self).__init__()
        self.anchor_scales = cfg['anchors_scales']
        self.anchor_ratios = cfg['anchor_ratios']

        # define bg/fg classifcation score layer
        self.num_anchors = len(self.anchor_scales) * len(self.anchor_ratios)  # 每个像素有3*3个anchor
        # define bg/fg classifcation score layer
        self.nc_score_out = len(self.anchor_scales) * len(self.anchor_ratios) * 2  # 每个anchor被分成

        # define the convrelu layers processing input feature map
        self.rpn_net = nn.Conv2d(in_channels, in_channels, [3,3], padding=1)# 卷积层

        self.rpn_cls_score_net = nn.Conv2d( in_channels, self.num_anchors * 2, [1,1])# 卷积层为前景、背景分类
        self.rpn_bbox_pred_net = nn.Conv2d( in_channels, self.num_anchors * 4, [1, 1])# 卷积层为四个坐标回归


    @staticmethod
    def reshape(x, d):
        input_shape = x.size()
        x = x.view(
            input_shape[0],
            int(d),
            int(float(input_shape[1] * input_shape[2]) / float(d)),
            input_shape[3]
        )
        return x
    def forward(self, feature):
        # return feature map after convrelu layer
        rpn_conv1 = F.relu(self.rpn_net(feature), inplace=True)#将feature卷积

        # get rpn classification score [b, num_anchors * 2,feat_h,feat_w]
        rpn_cls_score = self.rpn_cls_score_net(rpn_conv1)#每个anchor卷积出2个分类得分
        # [b, 2, num_anchors * feat_h, feat_w]
        rpn_cls_score_reshape = self.reshape(rpn_cls_score, 2)#将前景背景分类得分reshape
        # 将reshape的前景、背景得分输入softmax
        rpn_cls_prob_reshape = F.softmax(rpn_cls_score_reshape, 1)# [b, 2, num_anchors * feat_h, feat_w]
        # Move channel to the last dimenstion, to fit the input of python functions
        # △将经过softmax得到的预测类别概率进行reshape成[b, num_anchors * 2,feat_h,feat_w] ->[batch, feat_h,feat_w ,(num_anchors * 2)]
        rpn_cls_prob = rpn_cls_prob_reshape.view_as(rpn_cls_score).permute(0, 2, 3, 1)

        #将rpn预测得分reshape成[batch, feat_h,feat_w ,(num_anchors * 2)]
        rpn_cls_score = rpn_cls_score.permute(0, 2, 3, 1)
        # [b, 2, num_anchors * feat_h, feat_w]->[batch, num_anchors * feat_h, feat_w ,2]
        rpn_cls_score_reshape = rpn_cls_score_reshape.permute( 0, 2, 3, 1).contiguous()
        
        batch = rpn_cls_score_reshape.shape[0]
        # [batch, num_anchors * feat_h* feat_w, 2]        #预测的类别,即得分最大的那一类
        rpn_cls_pred = torch.max(rpn_cls_score_reshape.view(batch,-1, 2), 2)[1]# 1表示返回批次中最大的索引

        # get rpn offsets to the anchor boxes
        rpn_bbox_pred = self.rpn_bbox_pred_net(rpn_conv1)#卷积成对应4坐标形式[b, num_anchors * 4,feat_h,feat_w]
        rpn_bbox_pred = rpn_bbox_pred.permute( 0, 2, 3, 1).contiguous()  # [batch,feat_h,feat_w, (num_anchors*4)]


        # print('rpn_head rpn_cls_score', rpn_cls_score.shape)
        # print('rpn_head rpn_cls_prob', rpn_cls_prob.shape)
        # print('rpn_head rpn_cls_pred', rpn_cls_pred.shape )
        # print('rpn_head rpn_bbox_pred', rpn_bbox_pred.shape)
        
        # rpn_cls_score:                          [batch, feat_h,                       feat_w ,(num_anchors * 2)]
        # rpn_cls_score_reshape:                  [batch, num_anchors* feat_h,          feat_w ,                2]
        # rpn_cls_prob = softmax(rpn_cls_score) : [batch, feat_h,                       feat_w ,(num_anchors * 2)]
        # rpn_cls_pred = max(rpn_cls_score) :     [batch, num_anchors * feat_h* feat_w, 2]
        # rpn_bbox_pred:                          [batch,feat_h,                        feat_w, (num_anchors*4)]
        return rpn_cls_score, rpn_cls_score_reshape, rpn_cls_prob, rpn_cls_pred, rpn_bbox_pred

num_anchors:表示单个像素anchor的个数

rpn_cls_score_net :表示单个像素background和foreground的分类得分,也就是用one-hot的多分类。

rpn_bbox_pred_net:四个坐标的回归

rpn_net:紧连的input的卷积层

reshape:这里用的是很多,要特别注意。

二、处理过程

  1. 先卷积conv+relu。
  2. 将卷积的结果用于二分类rpn_cls_score [b, num_anchors * 2,feat_h,feat_w]
  3. 利用reshape函数获得对应的单个像素的bg/fg得分。rpn_cls_score_reshape [b, 2, num_anchors * feat_h, feat_w]
  4. 计算reshape的结果通过激活函数softmax计算分类概率,为什么要是用softmax,可能是为了将分值范围压缩到(0,1)之间,并且让线性变换成为非线性变换。rpn_cls_prob_reshape[b, 2, num_anchors * feat_h, feat_w]
  5. 然后通过view_as将[b, 2, num_anchors * feat_h, feat_w]的rpn_cls_prob_reshape
    变换成[b, num_anchors * 2,feat_h,feat_w],这是在连续内存中操作的,然后permute。为什么这么变换,我们期望的rpn_cls_prob[batch, feat_h,feat_w ,(num_anchors * 2)],虽然具体原理可能不是很清晰,但是目的是为了将单个像素所有的anchor进行fg/bg的二分类。
  6. 接下来的rpn_cls_score的permute就好理解了也是为了产生形式为[batch, feat_h,feat_w ,(num_anchors * 2)]主语rpn_cls_score的reshape位置中间还有一个rpn_cls_prob_reshape的view_as。
  7. rpn_cls_score_reshape也就是分类得分会继续permute,[b, 2, num_anchors * feat_h, feat_w]->[batch, num_anchors * feat_h, feat_w ,2],这里的目的主要是为了获得fg得分,后面会说到
  8. rpn_cls_pred直接计算feature anchor的fg/bg分类得分。[batch, num_anchors * feat_h* feat_w, 2] 
  9. rpn_bbox_pred就比较容易理解,直接从feature回归出四个坐标。[b, num_anchors * 4,feat_h,feat_w]->

三、总结

输入:feature [batch, 3,feat_h,, feat_w]

输出:

rpn_cls_score:                          [batch, feat_h,                       feat_w ,(num_anchors * 2)]获取单个像素每个anchor为bg/fg的得分,没有进行softmax
rpn_cls_score_reshape:                  [batch, num_anchors* feat_h,          feat_w ,                2]这里返回主要是因为后续计算需要用到
rpn_cls_prob = softmax(rpn_cls_score) : [batch, feat_h,                       feat_w ,(num_anchors * 2)]#经过softmax的预测概率
rpn_cls_pred = max(rpn_cls_score) :     [batch, num_anchors * feat_h* feat_w, 2]#预测为bg还是fg的类别
rpn_bbox_pred:                          [batch,feat_h,                        feat_w, (num_anchors*4)]#用以boundingbox回归项。

从shape的形式上我们就可以看到,经过rpn_head,我们希望将feature上的所有anchor分类,得到对应的bounding box。分类有两个,一个是直接输出,一个是经过softmax。因此期望输出的格式都是最后一个维度为num_anchors*2|4.

rpn_cls_score_reshape实际上是可以用来计算分类loss的。

 

 

Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐