Introduction to Multi-Class Classification
Previous Efforts of Machine Learning Series
The Topics of Machine Learning series took a bit of hiatus to explore performance measures after examining the various performance measures to quantify efficiency of machine learning models. This particular deviates from this tune to explore multi-class classification. Briefly, thus far in the Topics of Machine Learning series, we elaborated extensively on the various machine learning model categories that exist. Our first article to this series provided an overview to all of these model categories, an investigation which you may find here. After providing this broad introduction, we began diving individually into the specificities and algorithms associated with each of these models. This began with an analysis of supervised learning models, particularly its classifier and regression algorithms.
Next, we followed this with a discussion of the features of unsupervised learning models. We then differentiated the methodologies of batch and online learning, which you may investigate here. We finally followed up with an elaboration of the differences between instance based and model based learning, found here.
With the individual algorithms explored and discussed in depth, in one of our most recent article, we investigated the fundamentals of classification algorithms. Along that line of thought, we began by exploring the MNIST data set and its use in machine learning classification algorithms. Having developed this classifier in one of our preceding articles, we followed up with an analysis of one particular performance measurement of machine learning algorithms. In this discussion, we elaborated on the accuracy metric in discerning machine performance using the cross-validation technique. This was then followed by another discussion of performance measurement through the use of the confusion matrix.
One of the primary features of the confusion matrix is its ability to prompt the computation of precision and recall. These computations were discussed in great length in our previous article. In the most recent article, we also focused on the underlying theory associated with precision and recall in order to provide a better understanding of what these metrics actually represent. An additional article focused on measuring performance of our machine learning model using the ROC curve. Here, however, we change gears and return to exploring machine learning models. In particular, we focus on multi-class classification. Let’s begin.
Conceptualizing Multi-Class Classification
On Multi-Class Classification
Our previous investigation of classifier machine learning algorithms focused on binary classifiers. However, these are rather simple models, insufficient for complex modeling. Multi-class classification relies on algorithms capable of differentiating more than just two classes of data.
Several multi-class classification algorithms can handle these higher levels of data classes. These include the Random Forest classifier as well as the naive Bayes’ classifier.
Interestingly, a series of binary classifiers may co-habitate a system and operate synergistically in multi-class classification. For example, with our previous stochastic gradient descent classifier, it was a binary classifier which differentiated between fives and non-fives. We can use a series of binary classifiers to deal with the entire data set. For example, we can create binary classifiers, that differentiate one’s from non-one’s, two from non-two’s, etc. In the case of our MNIST data set, therefore, we would have a total of ten binary classifiers. This strategy is known as the one-versus-all strategy (OvA).
Creating a series of binary classifiers for multi-class classification can be a particularly tedious task. This may be especially true if you utilize a binary classifier for every category, including the one’s, two’s, three’s, etc. For that reason, we can keep our implementation of binary classifiers, but use them for comparing pairs of digits. For example, we can have a binary classifier that distinguishes zero’s and one’s, zero’s and two’s, one’s and two’s, etc.
This particular strategy is known as the one-versus-one (OvO) strategy of classification. While it relies on binary classifiers, it executes the task of multi-class classification. For a OvO type multi-class classification system approach, the number of classifiers that need to be procured consists of a function of the form N x (N-1)/2, where ‘N’ is the number of classes. For our previous classifier of the MNIST data set, this means we would need forty-five classifiers.
Ensemble Methods of Multi-Class Classification
The OvO and OvA methodologies represent ensemble methodologies of executing a multi-class classification task. A wide variety of empirical studies have reported the decomposition and ensemble methods can increase the performance on multi-class classification problems. However, one inherent flaw of OVO decomposition strategy is the problem of non-competent classifiers. According to Zhou, “A binary classifier can only distinguish the observations from the two classes used in training set, and therefore the binary classifier has no capability to discriminate the observations from other classes.”
In the case of an ensemble when approaching new observations, which classifier is useful and which one is not is not well known initially. Thus the important distinction is the score of the observation in a variety of the classifiers. Zhou continues to state that, “If all outputs from both the competent and noncompetent classifiers are taken into account equivalently in the ensemble stage, the non-competent classifiers may mislead the correct classification of the new observation.” Therefore, when implementing ensemble methodologies we must take care that the correct observations are being made.
Encoding Multi-Class Classification
Fortunately for us, when using SciKitLearn for creating a multi-class classification machine learning model from binary classifiers, SciKitLearn detects this automatically, making it quite easy for us to execute an OvA method.
Rather than using our manipulated five’s data as input, we will use our unadulterated labeled data. First, take a look at the code for creating the stochastic gradient descent machine learning algorithm:
Once we have created our stochastic gradient descent machine learning algorithm, we can easily execute an OvA methodology simply by using the ‘.fit’ function. When we do this, rather than using our five’s training data, we employ our total training data with labels. Take a look at the code for creating this OvA strategy.
This code readily identifies the object identified in the data. Furthermore, this code trains the stochastic gradient descent classifier using target classes from the digits zero to nine. Quite simply, this code trains ten binary classifier to identify the correct value.
The Take Away
The present article has extensively explored the utility and implementation of multi-class classification of data with machine learning algorithms. Several options are available which allow users to classify multiple different items in a data set. SciKitLearn offers several algorithms that may do this automatically. Additionally, we investigated ensemble methods for machine learning, which incorporates binary classifiers into a system. This includes the OvO and OvA methods. Hopefully this tutorial has been helpful to you. Our next article seeks to investigate the analysis of error which may manifest in our machine learning models. We look forward to seeing you there.