faster rcnn代码解读(三)rpn_head
faster rcnn代码解读参考:https://github.com/adityaarun1/pytorch_fast-er_rcnnhttps://github.com/jwyang/faster-rcnn.pytorchrpn_head主要是为了从feature中分类出相应的background和foreground。class RpnHead(nn.Module...
faster rcnn代码解读参考
:https://github.com/adityaarun1/pytorch_fast-er_rcnn
https://github.com/jwyang/faster-rcnn.pytorch
rpn_head主要是为了从feature中分类出相应的background和foreground。
class RpnHead(nn.Module):
def __init__(self,in_channels=512):
super(RpnHead, self).__init__()
self.anchor_scales = cfg['anchors_scales']
self.anchor_ratios = cfg['anchor_ratios']
# define bg/fg classifcation score layer
self.num_anchors = len(self.anchor_scales) * len(self.anchor_ratios) # 每个像素有3*3个anchor
# define bg/fg classifcation score layer
self.nc_score_out = len(self.anchor_scales) * len(self.anchor_ratios) * 2 # 每个anchor被分成
# define the convrelu layers processing input feature map
self.rpn_net = nn.Conv2d(in_channels, in_channels, [3,3], padding=1)# 卷积层
self.rpn_cls_score_net = nn.Conv2d( in_channels, self.num_anchors * 2, [1,1])# 卷积层为前景、背景分类
self.rpn_bbox_pred_net = nn.Conv2d( in_channels, self.num_anchors * 4, [1, 1])# 卷积层为四个坐标回归
@staticmethod
def reshape(x, d):
input_shape = x.size()
x = x.view(
input_shape[0],
int(d),
int(float(input_shape[1] * input_shape[2]) / float(d)),
input_shape[3]
)
return x
def forward(self, feature):
# return feature map after convrelu layer
rpn_conv1 = F.relu(self.rpn_net(feature), inplace=True)#将feature卷积
# get rpn classification score [b, num_anchors * 2,feat_h,feat_w]
rpn_cls_score = self.rpn_cls_score_net(rpn_conv1)#每个anchor卷积出2个分类得分
# [b, 2, num_anchors * feat_h, feat_w]
rpn_cls_score_reshape = self.reshape(rpn_cls_score, 2)#将前景背景分类得分reshape
# 将reshape的前景、背景得分输入softmax
rpn_cls_prob_reshape = F.softmax(rpn_cls_score_reshape, 1)# [b, 2, num_anchors * feat_h, feat_w]
# Move channel to the last dimenstion, to fit the input of python functions
# △将经过softmax得到的预测类别概率进行reshape成[b, num_anchors * 2,feat_h,feat_w] ->[batch, feat_h,feat_w ,(num_anchors * 2)]
rpn_cls_prob = rpn_cls_prob_reshape.view_as(rpn_cls_score).permute(0, 2, 3, 1)
#将rpn预测得分reshape成[batch, feat_h,feat_w ,(num_anchors * 2)]
rpn_cls_score = rpn_cls_score.permute(0, 2, 3, 1)
# [b, 2, num_anchors * feat_h, feat_w]->[batch, num_anchors * feat_h, feat_w ,2]
rpn_cls_score_reshape = rpn_cls_score_reshape.permute( 0, 2, 3, 1).contiguous()
batch = rpn_cls_score_reshape.shape[0]
# [batch, num_anchors * feat_h* feat_w, 2] #预测的类别,即得分最大的那一类
rpn_cls_pred = torch.max(rpn_cls_score_reshape.view(batch,-1, 2), 2)[1]# 1表示返回批次中最大的索引
# get rpn offsets to the anchor boxes
rpn_bbox_pred = self.rpn_bbox_pred_net(rpn_conv1)#卷积成对应4坐标形式[b, num_anchors * 4,feat_h,feat_w]
rpn_bbox_pred = rpn_bbox_pred.permute( 0, 2, 3, 1).contiguous() # [batch,feat_h,feat_w, (num_anchors*4)]
# print('rpn_head rpn_cls_score', rpn_cls_score.shape)
# print('rpn_head rpn_cls_prob', rpn_cls_prob.shape)
# print('rpn_head rpn_cls_pred', rpn_cls_pred.shape )
# print('rpn_head rpn_bbox_pred', rpn_bbox_pred.shape)
# rpn_cls_score: [batch, feat_h, feat_w ,(num_anchors * 2)]
# rpn_cls_score_reshape: [batch, num_anchors* feat_h, feat_w , 2]
# rpn_cls_prob = softmax(rpn_cls_score) : [batch, feat_h, feat_w ,(num_anchors * 2)]
# rpn_cls_pred = max(rpn_cls_score) : [batch, num_anchors * feat_h* feat_w, 2]
# rpn_bbox_pred: [batch,feat_h, feat_w, (num_anchors*4)]
return rpn_cls_score, rpn_cls_score_reshape, rpn_cls_prob, rpn_cls_pred, rpn_bbox_pred
num_anchors:表示单个像素anchor的个数
rpn_cls_score_net :表示单个像素background和foreground的分类得分,也就是用one-hot的多分类。
rpn_bbox_pred_net:四个坐标的回归
rpn_net:紧连的input的卷积层
reshape:这里用的是很多,要特别注意。
二、处理过程
- 先卷积conv+relu。
- 将卷积的结果用于二分类rpn_cls_score [b, num_anchors * 2,feat_h,feat_w]
- 利用reshape函数获得对应的单个像素的bg/fg得分。rpn_cls_score_reshape [b, 2, num_anchors * feat_h, feat_w]
- 计算reshape的结果通过激活函数softmax计算分类概率,为什么要是用softmax,可能是为了将分值范围压缩到(0,1)之间,并且让线性变换成为非线性变换。rpn_cls_prob_reshape[b, 2, num_anchors * feat_h, feat_w]
- 然后通过view_as将[b, 2, num_anchors * feat_h, feat_w]的rpn_cls_prob_reshape
变换成[b, num_anchors * 2,feat_h,feat_w],这是在连续内存中操作的,然后permute。为什么这么变换,我们期望的rpn_cls_prob[batch, feat_h,feat_w ,(num_anchors * 2)],虽然具体原理可能不是很清晰,但是目的是为了将单个像素所有的anchor进行fg/bg的二分类。 - 接下来的rpn_cls_score的permute就好理解了也是为了产生形式为[batch, feat_h,feat_w ,(num_anchors * 2)]主语rpn_cls_score的reshape位置中间还有一个rpn_cls_prob_reshape的view_as。
- rpn_cls_score_reshape也就是分类得分会继续permute,[b, 2, num_anchors * feat_h, feat_w]->[batch, num_anchors * feat_h, feat_w ,2],这里的目的主要是为了获得fg得分,后面会说到
-
rpn_cls_pred直接计算feature anchor的fg/bg分类得分。[batch, num_anchors * feat_h* feat_w, 2]
-
rpn_bbox_pred就比较容易理解,直接从feature回归出四个坐标。[b, num_anchors * 4,feat_h,feat_w]->
三、总结
输入:feature [batch, 3,feat_h,, feat_w]
输出:
rpn_cls_score: [batch, feat_h, feat_w ,(num_anchors * 2)]获取单个像素每个anchor为bg/fg的得分,没有进行softmax
rpn_cls_score_reshape: [batch, num_anchors* feat_h, feat_w , 2]这里返回主要是因为后续计算需要用到
rpn_cls_prob = softmax(rpn_cls_score) : [batch, feat_h, feat_w ,(num_anchors * 2)]#经过softmax的预测概率
rpn_cls_pred = max(rpn_cls_score) : [batch, num_anchors * feat_h* feat_w, 2]#预测为bg还是fg的类别
rpn_bbox_pred: [batch,feat_h, feat_w, (num_anchors*4)]#用以boundingbox回归项。
从shape的形式上我们就可以看到,经过rpn_head,我们希望将feature上的所有anchor分类,得到对应的bounding box。分类有两个,一个是直接输出,一个是经过softmax。因此期望输出的格式都是最后一个维度为num_anchors*2|4.
rpn_cls_score_reshape实际上是可以用来计算分类loss的。
开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!
更多推荐
所有评论(0)