1 分割常用评价指标

1.1 什么是MIoU

MIoU全称为Mean Intersection over Union,平均交并比。计算两个集合之间交集和并集的比例。
M I o U = 1 k ∑ i = 1 k P ∩ G P ∪ G MIoU=\frac{1}{k} \sum_{i=1}^{k} \frac{P \cap G}{P \cup G} MIoU=k1i=1kPGPG
M I o U = 1 k + 1 ∑ i = 0 k p i i ∑ j = 0 k p i j + ∑ j = 0 k p j i − p i i MIoU=\frac{1}{k+1} \sum_{i=0}^{k} \frac{p_{ii}}{\sum_{j=0}^{k}p_{ij}+\sum_{j=0}^{k}p_{ji}-p_{ii}} MIoU=k+11i=0kj=0kpij+j=0kpjipiipii
i 表示真实值,j 表示预测值 , p i j p_{ij} pij表示将 i 预测成 j 。

1.2 什么是PA

像素精度(Aixel Accuracy, PA),表示被正确分类的像素个数占总像素数的比例(指混淆矩阵中),也就是被正确分类的正例占所有正例的比例,结合2.2节的代码看。

P A = ∑ i = 0 k p i i ∑ i = 0 k ∑ j = 0 k p i j PA=\frac{\sum_{i=0}^{k}p_{ii}}{\sum_{i=0}^{k}\sum_{j=0}^{k}p_{ij}} PA=i=0kj=0kpiji=0kpii

1.2 什么是MPA

均像素精度(Mean Pixel Accuracy, MPA),在PA基础上进行调整,为每个类别内像素正确分类概率的平均值。
M P A = 1 k + 1 ∑ i = 0 k p i i ∑ j = 0 k p i j MPA=\frac{1}{k+1} \sum_{i=0}^{k} \frac{p_{ii}}{\sum_{j=0}^{k}p_{ij}} MPA=k+11i=0kj=0kpijpii

2 根据混淆矩阵计算评价指标

混淆矩阵:Confusion Matrix,用于直观展示每个类别的预测情况,能从中计算准确率(Accuracy)、精度(Precision)、召回率(Recall)、交并比(IoU)。


混淆矩阵的生成见详我的另一篇博客 np.bincount()用在分割领域生成混淆矩阵


2.1 计算MIoU


true_dog = (7+2+28+111+18+801+13+17+0+3) 		# 上图绿框
predict_dog = (1+0+8+48+13+801+4+17+1+0) 		# 上图黄框
# 因为分母的801加了两次,因此要减一次
iou_dog = 801 / (true_dog + predict_dog - 801)



def Mean_Intersection_over_Union(confusion_matrix):
    Per_class_IoU = np.diag(confusion_matrix) / (
                np.sum(confusion_matrix, axis=1) + np.sum(confusion_matrix, axis=0) -
    MIoU = np.nanmean(Per_class_IoU) # 跳过0值求mean
    return MIoU


np.diag(v, k=0): 提取对角线元素或构造对角线数组。
	k: 对角线的位置参数,0为默认主对角线,1为主对角线偏上一个单位,-1为主对角线偏下一个单位,以此类推。
np.sum(v, axis): 如果axis为None,则数组所有元素求和,得到一个值。如果指定axis,则按指定轴求和,例如二维数组,
	axis=0: 列和,返回数组
	axis=1 or -1: 行和,返回数组
np.nanmean(): array中nan取值为0,且取均值时忽略它

2.2 计算PA


def Pixel_Accuracy(confusion_matrix):
    Acc = np.diag(confusion_matrix).sum() / confusion_matrix.sum()
    return Acc

2.3 计算MPA


 def Pixel_Accuracy_Class(confusion_matrix):
 	# -----------------------------------------#
 	#	np.diag(confusion_matrix):一维数组
 	#	confusion_matrix.sum(axis=1):一维数组
 	# 	per_class_Acc:一维数组
 	# -----------------------------------------#
    per_class_Acc = np.diag(confusion_matrix) / confusion_matrix.sum(axis=1)
    Acc = np.nanmean(per_class_Acc)		# 返回一个数值
    return Acc

3 计算 MIoU 总体代码


import os

from PIL import Image
from tqdm import tqdm

# ----------------------------------------------------------#
#	DeeplabV3表示分割网络结构,其代码在deeplab.py中
#	得到数据集分割预测结果(8位灰度图png),本篇文章不用
# ----------------------------------------------------------#
from deeplab import DeeplabV3   
# ---------------------------------------------------------------------#
#	compute_mIoU和show_results,其代码在utils/utils_metrics.py中,解读见下节
# ---------------------------------------------------------------------#
from utils.utils_metrics import compute_mIoU, show_results

if __name__ == "__main__":
    #   miou_mode用于指定该文件运行时计算的内容
    #   miou_mode为0代表整个miou计算流程,包括获得预测结果、计算miou。
    #   miou_mode为1代表仅仅获得预测结果。其解读见:https://blog.csdn.net/weixin_45377629/article/details/124159784?spm=1001.2014.3001.5501
    #		本篇不介绍该过程
    #   miou_mode为2代表仅仅计算miou。	讲这个部分
    miou_mode       = 2
    #   分类个数+1、如2+1
    #   VOC数据集,所需要区分的类的个数+1
    num_classes     = 21
    #   区分的种类,和json_to_dataset里面的一样
    #   种类名称,此例为VOC
    name_classes    = ["background","aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]
    # name_classes    = ["_background_","cat","dog"]
    #   指向VOC数据集所在的文件夹
    #   默认指向根目录下的VOC数据集
    #   链接:https://pan.baidu.com/s/1OZfxoyVUKlESsyqs1nuuuw 提取码:wlna
    VOCdevkit_path  = '../VOCdevkit'

    #   image_ids:['图片名1', '图片名2',...]
    image_ids       = open(os.path.join(VOCdevkit_path, "VOC2007/ImageSets/Segmentation/val.txt"),'r').read().splitlines() 
    gt_dir          = os.path.join(VOCdevkit_path, "VOC2007/SegmentationClass/")
    miou_out_path   = "miou_out"
    #   pred_dir预测结果png图片路径,只有8位深度,灰度图
    #   正常jpg,RGB三通道,24位深度
    #   彩色png,RGBA四通道,32位深度
    pred_dir        = os.path.join(miou_out_path, 'detection-results')  

    #   获得预测结果,输出为8位深度的灰度图
    #	解读见:https://blog.csdn.net/weixin_45377629/article/details/124159784?spm=1001.2014.3001.5501
    if miou_mode == 0 or miou_mode == 1:
        if not os.path.exists(pred_dir):
        #   详细解读见:https://blog.csdn.net/weixin_45377629/article/details/124124238
        print("Load model.")
        deeplab = DeeplabV3()
        print("Load model done.")

        print("Get predict result.")
        for image_id in tqdm(image_ids):
            image_path  = os.path.join(VOCdevkit_path, "VOC2007/JPEGImages/"+image_id+".jpg")
            image       = Image.open(image_path)
            # ------------------------------------#
            #   image是png图片,8位深度,灰度图
            #   deeplab.get_miou_png(image)见下方解读
            #   # image size:(原图宽, 原图高)
            # ------------------------------------#
            image       = deeplab.get_miou_png(image)   
            image.save(os.path.join(pred_dir, image_id + ".png"))
        print("Get predict result done.")

    #   计算miou,怎么计算的,见下面分析
    #   gt_dir:VOC2007/SegmentationClass/,里面放着分割标注png图片
    #   pred_dir:预测结果输出路径,此时里面按道理已经有预测输出的8位png图片了
    #   image_ids:val.txt中的每一行图片名称,image_ids:['图片名1', '图片名2',...]
    #   num_classes:分类个数+1,故VOC为21
    #   name_classes:种类名称,例如这样:["_background_","cat","dog",...]
    #	compute_mIoU()见下一节分析
    if miou_mode == 0 or miou_mode == 2:
        print("Get miou.")
        # --------------------------------------------------------#
        #   hist:验证集的混淆矩阵,shape:(21,21)
        #   IoUs:每个类别的IoU,shape:(21,)
        #   PA_Recall:每个类别的像素精度召回率(行和),shape:(21,)
        #   Precision:每个类别的像素精度(列和),shape:(21,)
        # --------------------------------------------------------# 
        hist, IoUs, PA_Recall, Precision = compute_mIoU(gt_dir, pred_dir, image_ids, num_classes, name_classes)  # 执行计算mIoU的函数
        print("Get miou done.")

4 compute_mIoU()函数 代码解析


import csv
import os
from os.path import join

import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn.functional as F
from PIL import Image

# ---------------------------------------------------#
#   生成混淆矩阵,设标签宽W,长H
#   a是转化成一维数组的预测结果,形状(H×W,);
#   b是转化成一维数组的标签,形状(H×W,);
#   n是类别数,21
# ---------------------------------------------------#
def fast_hist(a, b, n):
    # ---------------------------------------------------#
    #   确保a在0~n的范围内,因为b为标签,默认是在0~n范围的
    #   k是(HxW,)的True和False一维数列
    # ---------------------------------------------------#
    k = (a >= 0) & (a < n)
    #   np.bincount计算了从0到n**2-1这n**2个数中每个数出现的次数,返回是一个一维数组
    #       minlength:限制返回列表的最小长度
    #   array.astype(int):主要是用来保证array里面元素确实为int
    #   array.reshape(n, n):返回值形状为(n, n)
    #   返回中,斜对角线上的为分类正确的像素点
    #   更详细介绍参考:https://blog.csdn.net/weixin_45377629/article/details/124237272
    return np.bincount(n * a[k].astype(int) + b[k], minlength=n ** 2).reshape(n, n)  

# --------------------------------------------------------------------#
#   每个类别的IoU,返回一维数组
#   np.maximum(X, Y) 用于逐元素比较两个array的大小,返回大的数,结果为一维数组
#       这样做为了防止出现除以0的情况
# --------------------------------------------------------------------#
def per_class_iu(hist):
    return np.diag(hist) / np.maximum((hist.sum(1) + hist.sum(0) - np.diag(hist)), 1) 

# ----------------------------------------------#
#   array.sum(1):每一行的和,返回一个一维数组
# ----------------------------------------------#
def per_class_PA_Recall(hist):
    return np.diag(hist) / np.maximum(hist.sum(1), 1) 

# ----------------------------------------------#
#   array.sum(0):每一列的和,返回一个一维数组
# ----------------------------------------------#
def per_class_Precision(hist):
    return np.diag(hist) / np.maximum(hist.sum(0), 1) 

# ----------------------------#
#   像素精度PA
# ----------------------------#
def per_Accuracy(hist):
    return np.sum(np.diag(hist)) / np.maximum(np.sum(hist), 1) 

#   计算miou
#   gt_dir:VOC2007/SegmentationClass/,里面放着分割标注png图片
#   pred_dir:预测结果输出路径,,此时里面已经有预测输出的8位png图片了
#   png_name_list==image_ids:val.txt中的每一行图片名称,['图片名1', '图片名2',...]
#   num_classes:分类个数+1,故VOC为21
#   name_classes:种类名称,例如这样:["_background_","cat","dog",...]
def compute_mIoU(gt_dir, pred_dir, png_name_list, num_classes, name_classes):  
    print('Num classes', num_classes)  
    #   创建一个全是0的矩阵,是一个混淆矩阵
    #   shape: 21x21
    hist = np.zeros((num_classes, num_classes))  
    #   gt_imgs: 获得验证集标签路径列表,方便直接读取
    #   pred_imgs: 获得验证集分割预测结果路径列表,方便直接读取
    gt_imgs     = [join(gt_dir, x + ".png") for x in png_name_list]  
    pred_imgs   = [join(pred_dir, x + ".png") for x in png_name_list]  

    #   读取每一个(图片-标签)对
    for ind in range(len(gt_imgs)): 
        #   读取一张图像分割结果,转化成numpy数组
        pred = np.array(Image.open(pred_imgs[ind]))  
        #   读取一张对应的标签,转化成numpy数组
        label = np.array(Image.open(gt_imgs[ind]))  

        #   如果图像分割结果与标签的大小不一样,这张图片就不计算
        #   ndarray.flatten(): 返回一个折叠成一维的数组
        if len(label.flatten()) != len(pred.flatten()):  
                'Skipping: len(gt) = {:d}, len(pred) = {:d}, {:s}, {:s}'.format(
                    len(label.flatten()), len(pred.flatten()), gt_imgs[ind],

        #   对一张图片计算21×21的hist矩阵(混淆矩阵)
        #   并累加(毕竟是求miou,针对多张图片)
        #   这儿第一个参数和第二个参数不可以交换,有点迷,结合链接
        #   https://blog.csdn.net/weixin_45377629/article/details/124237272?spm=1001.2014.3001.5501
        hist += fast_hist(label.flatten(), pred.flatten(), num_classes)  
        #   每10张就输出一下 目前已计算的图片 中 所有类别平均的mIoU值
        #   np.nanmean(): array中nan取值为0,且取均值时忽略它
        if ind > 0 and ind % 10 == 0:  
            print('{:d} / {:d}: mIou-{:0.2f}%; mPA-{:0.2f}%; Accuracy-{:0.2f}%'.format(
                    100 * np.nanmean(per_class_iu(hist)),       # 100* 是为了显示百分数
                    100 * np.nanmean(per_class_PA_Recall(hist)),
                    100 * per_Accuracy(hist)
    #   在上面for循环完成后,hist已经是针对所有验证集的混淆矩阵了
    #   计算所有验证集图片的逐类别mIoU值
    #   IoUs:一维数组,每个类别的IoU
    IoUs        = per_class_iu(hist)            # shape: (21,)
    PA_Recall   = per_class_PA_Recall(hist)     # shape: (21,)
    Precision   = per_class_Precision(hist)     # shape: (21,)
    #   逐类别输出一下mIoU值
    for ind_class in range(num_classes):
        print('===>' + name_classes[ind_class] + ':\tIou-' + str(round(IoUs[ind_class] * 100, 2)) \
            + '; Recall (equal to the PA)-' + str(round(PA_Recall[ind_class] * 100, 2))+ '; Precision-' + str(round(Precision[ind_class] * 100, 2)))

    #   在所有验证集图像上求所有类别平均的mIoU值,计算时忽略NaN值
    print('===> mIoU: ' + str(round(np.nanmean(IoUs) * 100, 2)) + '; mPA: ' + str(round(np.nanmean(PA_Recall) * 100, 2)) + '; Accuracy: ' + str(round(per_Accuracy(hist) * 100, 2)))
    # --------------------------------------------------------#
    #   np.array(hist, np.int),hist:验证集的混淆矩阵,shape:(21,21)
    #   IoUs:每个类别的IoU,shape:(21,)
    #   PA_Recall:每个类别的像素精度召回率(行和),shape:(21,)
    #   Precision:每个类别的像素精度(列和),shape:(21,)
    # --------------------------------------------------------#   
    return np.array(hist, np.int), IoUs, PA_Recall, Precision

5 输出展示实例


Load model.
model_data/deeplab_mobilenetv2.pth model, and classes loaded.
Load model done.
Get predict result.
100%|████████████████████████████████████████| 1449/1449 [15:38<00:00,  1.54it/s] 
Get predict result done.
Get miou.
Num classes 21
10 / 1449: mIou-42.30%; mPA-46.51%; Accuracy-95.15%
20 / 1449: mIou-48.72%; mPA-53.71%; Accuracy-94.98%
30 / 1449: mIou-54.87%; mPA-61.61%; Accuracy-94.93%
1430 / 1449: mIou-72.43%; mPA-82.89%; Accuracy-93.59%
1440 / 1449: mIou-72.46%; mPA-82.90%; Accuracy-93.60%
===>background: Iou-93.07; Recall (equal to the PA)-97.0; Precision-95.83
===>aeroplane:  Iou-82.29; Recall (equal to the PA)-92.21; Precision-88.44
===>bicycle:    Iou-41.94; Recall (equal to the PA)-87.43; Precision-44.63
===>bird:       Iou-83.24; Recall (equal to the PA)-92.82; Precision-88.97
===>boat:       Iou-62.78; Recall (equal to the PA)-78.5; Precision-75.81
===>bottle:     Iou-70.33; Recall (equal to the PA)-87.74; Precision-78.0
===>bus:        Iou-93.43; Recall (equal to the PA)-95.99; Precision-97.22
===>car:        Iou-85.4; Recall (equal to the PA)-90.35; Precision-93.97
===>cat:        Iou-88.13; Recall (equal to the PA)-94.02; Precision-93.36
===>chair:      Iou-35.82; Recall (equal to the PA)-56.13; Precision-49.75
===>cow:        Iou-79.71; Recall (equal to the PA)-86.17; Precision-91.4
===>diningtable:        Iou-50.65; Recall (equal to the PA)-54.37; Precision-88.11
===>dog:        Iou-80.32; Recall (equal to the PA)-90.76; Precision-87.47
===>horse:      Iou-78.72; Recall (equal to the PA)-88.03; Precision-88.15
===>motorbike:  Iou-82.17; Recall (equal to the PA)-91.93; Precision-88.56
===>person:     Iou-80.3; Recall (equal to the PA)-86.76; Precision-91.51
===>pottedplant:        Iou-57.04; Recall (equal to the PA)-70.01; Precision-75.49
===>sheep:      Iou-82.15; Recall (equal to the PA)-89.21; Precision-91.22
===>sofa:       Iou-46.13; Recall (equal to the PA)-51.58; Precision-81.36
===>train:      Iou-84.59; Recall (equal to the PA)-89.52; Precision-93.89
===>tvmonitor:  Iou-66.49; Recall (equal to the PA)-72.73; Precision-88.58
===> mIoU: 72.6; mPA: 83.01; Accuracy: 93.59
Get miou done.

