MaskRCNN源码解析1:整体结构概述

MaskRCNN源码解析2:特征图与anchors生成

MaskRCNN源码解析3:RPN、ProposalLayer、DetectionTargetLayer

MaskRCNN源码解析4-0:ROI Pooling 与 ROI Align理论

MaskRCNN源码解析4:头网络(Networks Heads)解析

MaskRCNN源码解析5:损失部分解析

 

目录

MaskRCNN概述:

D),损失部分解析

1,rpn 分类损失  交叉熵

2,rpn 回归损失  SmoothL1

3,mrcnn 分类损失  交叉熵

4,mrcnn 回归损失  SmoothL1

5,mask 损失  掩膜二进制交叉熵

Smooth-L1


MaskRCNN概述:

       Mask R-CNN是一个小巧、灵活的通用对象实例分割框架(object instance segmentation)。它不仅可对图像中的目标进行检测,还可以对每一个目标给出一个高质量的分割结果。它在Faster R-CNN[1]基础之上进行扩展,并行地在bounding box recognition分支上添加一个用于预测目标掩模(object mask)的新分支。该网络还很容易扩展到其他任务中,比如估计人的姿势,也就是关键点识别(person keypoint detection)。该框架在COCO的一些列挑战任务重都取得了最好的结果,包括实例分割(instance segmentation)、候选框目标检测(bounding-box object detection)和人关键点检测(person keypoint detection)。

参考文章:

Mask RCNN 学习笔记

MaskRCNN源码解读

令人拍案称奇的Mask RCNN

论文笔记:Mask R-CNN

Mask R-CNN个人理解

解析源码地址:

https://github.com/matterport/Mask_RCNN

D),损失部分解析

       Mask RCNN中总共有五个损失函数,分别是rpn网络的两个损失mrcnn的两个损失,以及mask分支的损失函数
前四个损失函数与fasterrcnn的损失函数一样,最后的mask损失函数的采用的是mask分支对于每个RoI有K*m^2维度的输出。K个(类别数)分辨率为m * m的二值mask。 Lmask为平均二值交叉熵损失(the average binary cross - entropy loss). 对于一个属于第k个类别的RoI, Lmask仅仅考虑第k个mask(其他的掩模输入不会贡献到损失函数中)。这样的定义会允许对每个类别都会生成掩模,并且不会存在类间竞争。

代码中损失部分的整体代码如下:

            # *************************8,计算各部分的损失******************************************************************
            # maskrcnn中总共有五个损失函数,分别是rpn网络的两个损失,mrcnn的两个损失,以及mask分支的损失函数。
            # 前四个损失函数与fasterrcnn的损失函数一样,最后的mask损失函数的采用的是mask分支对于每个RoI有K*m^2维度的输出。
            # K个(类别数)分辨率为m * m的二值mask。 
            # 因此作者利用了aper - pixelsigmoid,并且定义Lmask为平均二值交叉熵损失(the average binary cross - entropy loss). 
            # 对于一个属于第k个类别的RoI, Lmask仅仅考虑第k个mask(其他的掩模输入不会贡献到损失函数中)。
            # 这样的定义会允许对每个类别都会生成掩模,并且不会存在类间竞争。

            # Losses
            # rpn 分类损失
            rpn_class_loss = KL.Lambda(lambda x: rpn_class_loss_graph(*x), name="rpn_class_loss")(
                [input_rpn_match, rpn_class_logits])
            # rpn 回归损失
            rpn_bbox_loss = KL.Lambda(lambda x: rpn_bbox_loss_graph(config, *x), name="rpn_bbox_loss")(
                [input_rpn_bbox, input_rpn_match, rpn_bbox])
            # mrcnn 分类损失
            class_loss = KL.Lambda(lambda x: mrcnn_class_loss_graph(*x), name="mrcnn_class_loss")(
                [target_class_ids, mrcnn_class_logits, active_class_ids])
            # mrcnn 回归损失
            bbox_loss = KL.Lambda(lambda x: mrcnn_bbox_loss_graph(*x), name="mrcnn_bbox_loss")(
                [target_bbox, target_class_ids, mrcnn_bbox])
            # mask 损失
            mask_loss = KL.Lambda(lambda x: mrcnn_mask_loss_graph(*x), name="mrcnn_mask_loss")(
                [target_mask, target_class_ids, mrcnn_mask])

1,rpn 分类损失  交叉熵

rpn_match与GT有关,前景为1背景为0;   
rpn_class_logits 是rpn_graph中生成的,是特征图Reshape to [batch, anchors, 2]但没有经过softmax激活的值。

# rpn 分类损失  交叉熵
def rpn_class_loss_graph(rpn_match, rpn_class_logits):
    """RPN anchor classifier loss.

    rpn_match: [batch, anchors, 1]. Anchor match type. 1=positive,
               -1=negative, 0=neutral anchor.
    rpn_class_logits: [batch, anchors, 2]. RPN classifier logits for BG/FG.
    """
    # Squeeze last dim to simplify
    rpn_match = tf.squeeze(rpn_match, -1)
    # Get anchor classes. Convert the -1/+1 match to 0/1 values. # 正样本转换为1,负样本和忽略的转换为0
    anchor_class = K.cast(K.equal(rpn_match, 1), tf.int32)
    # Positive and Negative anchors contribute to the loss, but neutral anchors (match value = 0) don't.
    indices = tf.where(K.not_equal(rpn_match, 0))  # 取不等于0的,即只取正样本
    # Pick rows that contribute to the loss and filter out the rest.
    rpn_class_logits = tf.gather_nd(rpn_class_logits, indices) # 选择对损失由贡献的行,忽略其他行
    anchor_class = tf.gather_nd(anchor_class, indices)
    # 交叉熵损失Cross entropy loss
    loss = K.sparse_categorical_crossentropy(target=anchor_class,
                                             output=rpn_class_logits,
                                             from_logits=True)
    loss = K.switch(tf.size(loss) > 0, K.mean(loss), tf.constant(0.0))  # 如果损失大于0输出,小于0输出0
    return loss

2,rpn 回归损失  SmoothL1

target_bbox 就是GT
rpn_match与GT有关,前景为1背景为0;
rpn_bbox  是rpn_graph中生成的,特征图Reshape to [batch, anchors, 4]的值。

SmoothL1损失之前解析其他算法时已经讲过了,这里也不多说了。

# rpn 回归损失  SmoothL1
def rpn_bbox_loss_graph(config, target_bbox, rpn_match, rpn_bbox):
    """Return the RPN bounding box loss graph.

    config: the model config object.
    target_bbox: [batch, max positive anchors, (dy, dx, log(dh), log(dw))].
        Uses 0 padding to fill in unsed bbox deltas.
    rpn_match: [batch, anchors, 1]. Anchor match type. 1=positive,
               -1=negative, 0=neutral anchor.
    rpn_bbox: [batch, anchors, (dy, dx, log(dh), log(dw))]
    """
    # Positive anchors contribute to the loss, but negative and
    # neutral anchors (match value of 0 or -1) don't.
    rpn_match = K.squeeze(rpn_match, -1)  # squeeze()将下标为axis的一维从张量中移除
    indices = tf.where(K.equal(rpn_match, 1))  # 取正样本索引

    # Pick bbox deltas that contribute to the loss
    rpn_bbox = tf.gather_nd(rpn_bbox, indices)   # 选择正样本偏移量

    # Trim target bounding box deltas to the same length as rpn_bbox.
    # 将目标边界框增量修剪为与rpn_bbox相同的长度。
    batch_counts = K.sum(K.cast(K.equal(rpn_match, 1), tf.int32), axis=1)
    target_bbox = batch_pack_graph(target_bbox, batch_counts, config.IMAGES_PER_GPU)

    loss = smooth_l1_loss(target_bbox, rpn_bbox)
    
    loss = K.switch(tf.size(loss) > 0, K.mean(loss), tf.constant(0.0))
    return loss

3,mrcnn 分类损失  交叉熵

 target_class_ids,  目标类别ID GT;
pred_class_logits, 特征图由头网络连接全连接层得到,预测的类别ID;
active_class_ids  实际的类别ids 80类;
实际用target_class_ids和pred_class_logits计算交叉熵损失;
active_class_ids用于消除不在图像的预测类别中的类别的预测损失。

# mrcnn 分类损失  交叉熵
def mrcnn_class_loss_graph(target_class_ids, pred_class_logits, active_class_ids):
    """Loss for the classifier head of Mask RCNN.

    target_class_ids: [batch, num_rois]. Integer class IDs. Uses zero
        padding to fill in the array.
    pred_class_logits: [batch, num_rois, num_classes]
    active_class_ids: [batch, num_classes]. Has a value of 1 for
        classes that are in the dataset of the image, and 0
        for classes that are not in the dataset.
    """
    # During model building, Keras calls this function with
    # target_class_ids of type float32. Unclear why. Cast it
    # to int to get around it.
    target_class_ids = tf.cast(target_class_ids, 'int64')

    # Find predictions of classes that are not in the dataset.
    # 查找不在数据集中的类的预测
    pred_class_ids = tf.argmax(pred_class_logits, axis=2)
    # TODO: Update this line to work with batch > 1. Right now it assumes all
    #       images in a batch have the same active_class_ids
    pred_active = tf.gather(active_class_ids[0], pred_class_ids)

    # Loss
    loss = tf.nn.sparse_softmax_cross_entropy_with_logits(
        labels=target_class_ids, logits=pred_class_logits)

    # Erase losses of predictions of classes that are not in the active
    # classes of the image.
    # 消除不在图像的预测类别中的类别的预测损失。
    loss = loss * pred_active

    # Computer loss mean. Use only predictions that contribute
    # to the loss to get a correct mean.
    loss = tf.reduce_sum(loss) / tf.reduce_sum(pred_active)
    return loss

4,mrcnn 回归损失  SmoothL1

target_bbox, 就是GT框
target_class_ids,  GT框对应的类别ID
pred_bbox  由特征图经过头网络卷积得到的预测框

# mrcnn 回归损失  SmoothL1
def mrcnn_bbox_loss_graph(target_bbox, target_class_ids, pred_bbox):
    """Loss for Mask R-CNN bounding box refinement.

    target_bbox: [batch, num_rois, (dy, dx, log(dh), log(dw))]
    target_class_ids: [batch, num_rois]. Integer class IDs.
    pred_bbox: [batch, num_rois, num_classes, (dy, dx, log(dh), log(dw))]
    """
    # Reshape to merge batch and roi dimensions for simplicity.
    target_class_ids = K.reshape(target_class_ids, (-1,))
    target_bbox = K.reshape(target_bbox, (-1, 4))
    pred_bbox = K.reshape(pred_bbox, (-1, K.int_shape(pred_bbox)[2], 4))

    # Only positive ROIs contribute to the loss. And only
    # the right class_id of each ROI. Get their indices.
    positive_roi_ix = tf.where(target_class_ids > 0)[:, 0]
    positive_roi_class_ids = tf.cast(
        tf.gather(target_class_ids, positive_roi_ix), tf.int64)
    indices = tf.stack([positive_roi_ix, positive_roi_class_ids], axis=1)

    # Gather the deltas (predicted and true) that contribute to loss
    target_bbox = tf.gather(target_bbox, positive_roi_ix)
    pred_bbox = tf.gather_nd(pred_bbox, indices)

    # Smooth-L1 Loss
    loss = K.switch(tf.size(target_bbox) > 0,
                    smooth_l1_loss(y_true=target_bbox, y_pred=pred_bbox),
                    tf.constant(0.0))
    loss = K.mean(loss)
    return loss

5,mask 损失  掩膜二进制交叉熵

Lmask是mask分支上的损失函数,输出大小为K*m*m,其编码分辨率为m*m的K个二进制mask,即K个类别每个对应一个二进制mask,对每个像素使用sigmoid 函数,Lmask是平均二进制交叉熵损失。RoI的groundtruth类别为k,Lmask只定义在第k个Mask上,其余的mask属于对它没有影响(也就是说在训练的时候,虽然每个点都会有K个二进制mask,但是只有一个k类mask对损失有贡献,这个k值是分类branch预测出来的)。

Mask-RCNN没有类间竞争,因为其他类别不贡献损失。mask分支对每个类别都有预测,依靠分类层选择输出mask(此时大小应该是m*m,之预测了一个类别出来,只需要输出该类别对应的mask即可),使用FCN的一般方法是对每个像素使用softmax以及多项交叉熵损失,会出现类间竞争。二值交叉熵会使得每一类的 mask 不相互竞争,而不是和其他类别的 mask 比较 。

target_mask,  GT mask
target_class_ids,  GT框对应的类别ID
mrcnn_mask  由图训练的到的mask

def mrcnn_mask_loss_graph(target_masks, target_class_ids, pred_masks):
    """Mask binary cross-entropy loss for the masks head.

    target_masks: [batch, num_rois, height, width].
        A float32 tensor of values 0 or 1. Uses zero padding to fill array.
    target_class_ids: [batch, num_rois]. Integer class IDs. Zero padded.
    pred_masks: [batch, proposals, height, width, num_classes] float32 tensor
                with values from 0 to 1.
    """
    # Reshape for simplicity. Merge first two dimensions into one.
    target_class_ids = K.reshape(target_class_ids, (-1,))
    mask_shape = tf.shape(target_masks)
    target_masks = K.reshape(target_masks, (-1, mask_shape[2], mask_shape[3]))
    pred_shape = tf.shape(pred_masks)
    pred_masks = K.reshape(pred_masks,
                           (-1, pred_shape[2], pred_shape[3], pred_shape[4]))
    # Permute predicted masks to [N, num_classes, height, width]
    pred_masks = tf.transpose(pred_masks, [0, 3, 1, 2])

    # Only positive ROIs contribute to the loss. And only
    # the class specific mask of each ROI.
    positive_ix = tf.where(target_class_ids > 0)[:, 0]
    positive_class_ids = tf.cast(
        tf.gather(target_class_ids, positive_ix), tf.int64)
    indices = tf.stack([positive_ix, positive_class_ids], axis=1)

    # Gather the masks (predicted and true) that contribute to loss
    y_true = tf.gather(target_masks, positive_ix)
    y_pred = tf.gather_nd(pred_masks, indices)

    # Compute binary cross entropy. If no positive ROIs, then return 0.
    # shape: [batch, roi, num_classes]
    loss = K.switch(tf.size(y_true) > 0,
                    K.binary_crossentropy(target=y_true, output=y_pred),
                    tf.constant(0.0))
    loss = K.mean(loss)
    return loss

Smooth-L1

代码中的x=K.abs(y_true-y_pred) 

"""
Implements Smooth-L1 loss.
y_true and y_pred are typically: [N, 4], but could be any shape.
"""
def smooth_l1_loss(y_true, y_pred):
    diff = K.abs(y_true - y_pred)
    less_than_one = K.cast(K.less(diff, 1.0), "float32")
    loss = (less_than_one * 0.5 * diff**2) + (1 - less_than_one) * (diff - 0.5)
    return loss

Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐