        在机器学习领域,混淆矩阵(confusion matrix)是一种评判模型结果指标的可视化工具,属于模型评估的一部分,多用于判断分类器(Classifier)的优劣。特别用于监督学习,在无监督学习一般叫做匹配矩阵。


        混淆矩阵以行代表实际类别结果(或 Ground Truth),以列表示实际分类预测结果,其中每一个元素(i,j)所存储的值表示实际类别为type(i)而被分类器识别为type(j)的个数。


2. 二分类情况下的混淆矩阵


图 1 二分类混淆矩阵 

  1. TP(True Positive):将正(Positive)类预测为正类,真实为0,预测也为0(此处假定0表示正类,1表示负类)
  2. FN(False Negative):将正类预测为负类,真实为0,预测为1
  3. FP(False Positive):将负(Negative)类预测为正类, 真实为1,预测为0
  4. TN(True Negative):将负类预测为负类,真实为1,预测也为1






3. 多分类情况下的混淆矩阵


图 2 多分类情况下的混淆矩阵



4. 混淆矩阵的可视化

4.1 sklearn. confusion_matrix() and plot_confusion_matrix()

        sklearn.metrics包中提供了confusion_matrix() 方法用于根据预测结果以及标签真值)(Ground Truth)生成混淆矩阵。而另一个方法plot_confusion_matrix()则用于直接绘制图示化的混淆矩阵。



# Example 1: Using sklearn plot_confusion_matrix 
# Ref: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.plot_confusion_matrix.html
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.metrics import plot_confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
X, y = make_classification(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(
        X, y, random_state=0)
clf = SVC(random_state=0)
clf.fit(X_train, y_train)

plot_confusion_matrix(clf, X_test, y_test)  


 图3 plot_confusion_matrix()绘制混淆矩阵示例

4.2 seaborn heatmap()

        Ref: https://www.stackvidhya.com/plot-confusion-matrix-in-python-and-why/


        Seaborn heatmap()方法的调用参数如下所示(data为必须的参数,其余为可选参数用于控制图示效果选项。更多的参数请参考Seaborn heatmap()文档):

  • data – A rectangular dataset that can be coerced into a 2d array. Here, you can pass the confusion matrix you already have
  • annot=True – To write the data value in the cell of the printed matrix. By default, this is False.
  • cmap=Blues – This is to denote the matplotlib color map names. 

        heatmap()方法返回matplotlib axes,可以存储于一个变量,以便于后面进一步修改图示效果选项,比如说,设置titlex-axis and y-axis labels and tick labels for x-axis and y-axis. 注意,也可以在heatmap()的参数列表中用ax参数来指定用于存储matplotlib axes的变量,如以下例所示。

  • Title – Used to label the complete image. Use the set_title() method to set the title.
  • Axes-labels – Used to name the x axis or y axis. Use the set_xlabel() to set the x-axis label and set_ylabel() to set the y-axis label.
  • Tick labels – Used to denote the datapoints on the axes. You can pass the tick labels in an array, and it must be in ascending order. Because the confusion matrix contains the values in the ascending order format. Use the xaxis.set_ticklabels() to set the tick labels for x-axis and yaxis.set_ticklabels() to set the tick labels for y-axis.

        最后需要调用plot.show() 方法以显示该图.

# Example2: Using seaborn heatmap
# Ref: https://www.stackvidhya.com/plot-confusion-matrix-in-python-and-why/
import seaborn as sns
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt

print('Example2: Using seaborn heatmap for confusion matrix visualization')
f,ax = plt.subplots()
# y_true = [0,0,1,2,2,0,2,0,1]
y_pred = clf.predict(X_test)
C2 = confusion_matrix(y_test,y_pred,labels=[0,1])
# print C2
sns.heatmap(C2,annot=True,ax=ax) #plot heatmap
# ax.plot(C2)

ax.set_title('Seaborn Confusion Matrix with labels\n\n');
ax.set_ylabel('true') #


 图4 基于seaborn heatmap()绘制混淆矩阵示例


