## Introduction to the Confusion Matrix

The *Topics of Machine Learning *series has invested a healthy amount of effort in providing the various tools available for amplifying machine learning sophistication. This article follows suit, providing a technique for validating your model’s performance with the implementation of the confusion matrix. Within this *Topics of Machine Learning* series, we elaborated extensively on the various machine learning model categories that exist. Our first article to this series provided an overview to all of these model categories, an investigation which you may find here. After providing this broad introduction, we began diving individually into the specificities and algorithms associated with each of these models. This began with an analysis of supervised learning models, particularly its classifier and regression algorithms.

Next, we followed this with a discussion of the features of unsupervised learning models. We then differentiated the methodologies of batch and online learning, which you may investigate here. We finally followed up with an elaboration of the differences between instance based and model based learning, found here.

With the individual algorithms explored and discussed in depth, in one of our most recent article, we investigated the fundamentals of classification algorithms. Along that line of thought, we began by exploring the MNIST data set and its use in machine learning classification algorithms. Having developed this classifier in one of our preceding article, we followed up with an analysis of one particular performance measurement of machine learning algorithms. In this discussion, we elaborated on the accuracy metric in discerning machine performance using the cross-validation technique. Here, we continue on with this thread of performance measurement through the use of the confusion matrix. Let’s begin.

## Conceptualizing the Confusion Matrix

*Purpose of Measuring Performance *

In our previous article, we elaborated a bit upon the theorization that underlies performance measuring. However, we will provide a brief discussion here. Machine learning models rapidly delineate values of data items within a set. How efficiently these models execute this functionality is enigmatic lest we have some means of quantifying machine performance. The utilization of performance measures allow the exploitation of quantitative aspects of our model, including its accuracy, precision, and facets of efficiency.

*External Sources*

The validity of performance measures is confirmed in a variety of contexts, but bioinformatician Yasen Jiao presents it elegantly in his publication “Performance measures in evaluating machine learning based bioinformatics predictors for classifications.” Jiao presents the importance of machine learning performance measurement as a consequence of the potential for an algorithm to over-fit or under-fit data. The author contends, “Because machine learning algorithms usually use existing data to establish predictive models, it is possible that a predictive model is over-fitted or over-optimized on the existing data.” Here, Jiao points out that a consequence of the manner in which the model generates data, thereby creating inconsistencies which we must account for. Jiao clarifies his definition of over-fit or over-optimize by stating that the algorithm performs well on existing data but does not have sufficient flexibility to operate well on incoming data.

The importance of performance measures is consequential of the fact that it conveys such a large impact on the validity of our data, and thus, our insights. Jiao supports this argument in stating that “the prediction performance drops drastically when it is applied in practical studies with novel data.” In order to correct for the apparent bias that we may introduce into our model if the optimization has not been accounted for, Jiao finally argues that “when applying these computational predictors, it is vitally important to understand their mechanisms and the conditions of their performances in the first place.” We share this conviction as well.

*Measurement of Precision*

The concept of precision is colloquially understood as the quality of being exact or accurate. However, we know from our previous discussions that precision is not the same thing as accuracy. In particular, precision represents the closeness between a series of of measurements. Precision is of great importance to machine learning performance assessments, as it is not sufficient to simply acquire a correct answer, but to ensure that the correct answer is a consistent output. As stated in this article, “Precision, or the positive predictive value, refers to the fraction of relevant instances among the total retrieved instances.” Therefore, in computing precision, through whatever mechanism we decide, we really calculate the total true answers correctly identified.

*Measurement of Recall *

Whereas precision represents the consistency of output as a function of correctly identified true values, recall represents a metric also known as the true positive rate. The true positive rate represents the ratio of positive instances correctly identified and the sum of total true positives with false negatives. Recall represents the particular importance as it allows us to understand how often the correct output obtained with respect to how often the correct answer denied. According to this article, “Recall, also known as sensitivity, refers to the fraction of relevant instances retrieved over the total amount of relevant instances.” As such, recall allows us to understand the ability of our machine to correctly identify positive instances.

*What is the Confusion Matrix? *

This methodology of performance measurement was not so elegantly named. Nevertheless provides us with the ability of computing the precision and recall of our machine learning algorithms. More generally speaking, the confusion matrix keeps count of the number of times that a particular item is misclassified as a different item.

The row of a confusion matrix represents the class of the object we seek to look at. The column of that row demonstrates all possible values that the data item may be classified as. Therefore, if we are looking at the value ‘m’, we first go to the ‘m^{th}‘ row of the confusion matrix. If we then look at the ‘m^{th}‘ column of the ‘m^{th}‘ row, our hope is that this will be the greatest value in the row (if our algorithm has decent recall). We can then look at other columns in the row to see what the ‘m^{th}‘ value is most mischaracterized as.

## Encoding the Confusion Matrix

*The Training Model *

Our previous article created a basic stochastic gradient descent algorithm machine learning model of the MNIST data set. We demonstrate the code for executing this functionality as follows:

Our first action with approaching the MNIST data set was the splitting of the data into the training and testing data, and subdividing these groups on the basis of whether they were data which included labels or did not include labels. We separate and acquire these sets of information by separately calling the ‘data’ and ‘target’ arrays. These data sets exhibits significant differences. The ‘data’ has a shape of 70,000×784 because each image consists of 784 pixels. The ‘target’ array, alternatively, is linear and only has 70,000 objects as these represents the labels. Keep in mind that for both of these array objects, we must also further separate them into training and testing data.

After splitting the data into the different groups, we trained a machine learning model that operates as a binary classifier by differentiating images which are fives and those which are not fives. For training this stochastic gradient descent algorithm in our binary classifier, we first established the proper data sets. We first need to acquire the five labels from our label data set for the training and the testing data.

Once we do this, all we must do is call the SGD function from scikitlearn. Then we may pass in training data and the training labels for the five. We can then simply use the ‘predict’ function and pass in the position of some particular digit. If that digit is five, to some degree of accuracy, the function will return true. If that digit is not five, to some degree of accuracy, the function will return false.

*Cross-Validation *

Our previous article elaborated on the intricacies of computing the cross-validation of a machine learning model. While this is not the intent of this article, we will at least present the code used for executing this functionality:

*Calculations with the Confusion Matrix*

In order to derive a computation of the confusion matrix, first, we consolidate the prediction values. These prediction values can then be compared to the target values to observe how correct the algorithm was.

In the beginning of our calculation, the first tool we implement is the cross_val_predict function. This may seem quite similar to the cross_val_score function used for computing the cross-validation in a data set. The difference is that, though cross_val_predict executes K-fold cross-validation, it does not return evaluation scores, but rather, returns predictions of the test fold.

*The Code*

With test fold predictions acquired, the confusion matrix may now be calculated using SciKitLearn’s confusion_matrix function. Let’s take a look at how we might execute this coding of the confusion matrix:

In line 44, we first utilize the cross_val_predict function. Here, we specify inputs of the stochastic gradient descent algorithm. These include input training data which is associated with labels, and the training labels for images of fives. Additionally, we set the ‘cv’ argument equal to three which exploits the number of cross-validation steps.

Next, in line 45, we imported the confusion_matrix function from the SciKitLearn library. We then call this function using the training labels for the five images and the predictions from the cross_val_predict function. We then print out the output. Take a look at the output confusion matrix procured via this code:

*Analysis*

Remember, each row of the matrix represents one of the classes which the value may belong to. The first row represents the negative class, what we may understand as the non-fives class. Alternatively, the second row denotes the positive class, the class which represents the fives. Now, the columns then represent firstly the non-fives, followed by the fives.

We can pick apart this confusion matrix in much greater detail. The first row and first column represents correctly classified non-fives, and the non-fives belong to the negative class. Thus, the first item in the first row represents identification of true negatives. In the second position of the first row, these values represent fives which were incorrectly classified as non-fives. Thus, these are false negatives. In the second row, the first position represents the values which were incorrectly classified as fives. Because fives represents the positive class, this position denotes false positives. Finally, the second position consists of values which were correctly classified as fives. Therefore, we may call these true positives. As we expect, the largest values appear in the main column, indicating that for the most part, our algorithm worked.

## Executing Computations

*Computing Precision *

We previously defined precision as the consistency of deriving correct values from our algorithm. We have a readily available mathematical formula which permits us to compute the precision of the algorithm. The formula looks like:

From this formula, we acquire a much more robust understanding of what precision represents. Here, we see that precision is the quotient between the number of true positive identities and the sum of true positives and false positives. By this logic, we may understand precision as the ratio between true positives and the number of fives identified.

The precision of a machine learning algorithm may become readily available through the implementation of the SciKitLearn function ‘precision_score’. All we must do is input the list which contains all of the fives and the list containing all of the predictions for these values. Let us take a look at how we encode this functionality in our Python script:

When we execute the code, we obtain the following output:

We see here that the code reveals a precision score of approximately 0.837. This may be understood as a precision of 83.7%, which suggests that ~84% of the time, a five is correctly classified as a five.

*Computing Recall *

In addition to our mathematical formula which computed precision from the confusion matrix, we also have available which allows us to compute the recall of our machine learning algorithm. Let’s take a look at the mathematical formula for the computation of recall:

From this representation of recall in the present formula, we may acquire a better understanding of what recall denotes. We may understand recall as the quotient between the true positives, and the total number of fives. Recall can be exogenously computed, however, as a consequence of the recall_score function from SciKitLearn. Let’s take a look at how we can encode this in our Python script:

As with the precision score function, the recall score function takes as input the data set of the fives and the predictions. The output for this code appears as follows:

From this we see that our recall score is a bit lower than our precision score at 0.651. This implies that ~65% of the time, for all of the fives in the data set, a five is correctly identified.

## The Take Away

The present article extensively discussed the features of quantifying the efficacy of classifier performance by the metric of precision and recall. In particular, we implemented the technique of the confusion matrix as a proxy for precision and recall. Precision and recall are fundamentally roots of statistics. However, they may be effectively applied for the purpose of computing precision and recall of a machine learning model. SciKitLearn provides several helpful avenues for computing the confusion matrix of our model, making this task quite simple.

We employed our code from our previous article, which involved the creation of a binary classifier via the MNIST set. Here, established the precision and recall of the model using the confusion matrix technique. From the values represented within the confusion matrix, the precision and recall can be readily computed. Our next article in the Topics of Machine Learning series explores the theory which underlies the concepts of precision and recall. We look forward to seeing you there.