Introduction to Learning Curves
For quite some time now in our machine learning series, we have belabored various aspects of regression algorithms in machine learning. Our first insight to this concept manifested in our initial article discussing supervised machine learning models, which may be found here. After exploring a variety of different machine learning models and embarking on several examples, we investigated in great detail the subject of regression as it pertains to these models. We began first by providing a succinct overview of linear regression models in machine learning. We then elaborated excruciatingly on these algorithms in detail, beginning first with gradient descent, followed by batch gradient descent and stochastic gradient descent. In our most recent article, we elaborated on the features associated with polynomial regression and regression from a mathematical perspective. However, in this article, we focus specifically on the mathematics associated with the topic of regression, as it may be applied in machine learning.
Conceptualizing Learning Curves
Overview of Learning Curves
“Probabilistic Deep Learning With Python” gives excellent insight into the role of the learning curve in machine learning and neural networks. This is one of the primary tools I have used for enhancing my knowledge of the field. According to this resource, learning curves are employed in a variety of different fields to represent how learning and experience are correlated to each other. Typically, experience is a proxy for time and is plotted on the x-axis of the learning curve, while the degree of learning is represented on the vertical axis. Learning is understood as a metric conferring the mastery of a particular skill or topic.
Learning curves have a variety of different shapes conferring different significances. For example, sigmoidal learning curves denote a brief period of no learning, followed by rapid learning, and terminating in a plateau where more experience no longer confers an increase in learning. Alternatively, linear learning curves exhibit a constant increase in learning with respect to experience. On the other end of the spectrum, hyperbolic learning curves exhibit long durations where any experience does not impact learning, but after a while, it increases rapidly.
According to another leading source on machine learning, the learning curve is used widely in machine learning, but is different in that it relates performance to experience. Here, performance is a proxy for learning. Performance is the error rate or accuracy of the learning system, while experience may be the number of training examples used for learning or the number of iterations used in optimizing the system model parameters.The machine learning curve is useful for many purposes including comparing different algorithms,choosing model parameters during design, adjusting optimization to improve convergence, and determining the amount of data used for training.
In machine learning, a learning curve shows the validation and training score of an estimator for varying numbers of training samples. It is a tool to find out how much a machine learning model benefits from adding more training data and whether the estimator suffers more from a variance error or a bias error. If both the validation score and the training score converge to a value that is too low with increasing size of the training set, it will not benefit much from more training data.
The learning curve has deep mathematical concepts that must be understood in order to create effective algorithms that predict data outcomes. For that reason, I highly recommend at least checking out the following resource. It has proven to be a leading source on procuring highly efficient models by breaking down the mathematical features of learning curves. Check it out here.
Discerning the Fit
Overfitting and underfitting is one of the primary challenges in creating an effective machine learning model. In our previous investigations, we demonstrated that cross-validation metrics and other performance measures can demonstrate the efficiency of the machine learning model. However, the learning curve can provide insight into whether our model is too simplistic or too complex.
With machine learning, the learning curve helps to demonstrate performance by plotting the performance of the model on the training and validation sets with respect to the training set size. The following example was made in consideration of one of the leading machine learning sources “Hands-On Machine Learning with SciKit-Learn”. This is one of primary tools I used in advancing my knowledge in data science, and I would recommend you take a look at it. You may find it here.
Take a look at the following code which helps to create a learning curve based upon our data:
When we execute this code, we obtain a plot with a series of learning curves which appears as follows:
If you want to get a more comprehensive look at the intricacies of the code needed for creating this machine learning curve, perhaps take a look at this source. It contains this example but gives more detail into implementing this code.
SciKit-Learn’s Mean Squared Error Function
The SciKit-Learn ‘Metric’ module utilizes a variety of essential functions which facilitate computation of loss functions, scores, and utility measurements. One of these, which we use in the example above, is the mean squared error function. As you may infer, the mean squared error function computes the mean square error, which is particularly helpful for quantifyinh quadratic loss. The mathematical relationship established by the mean squared error function appears as follows:
A variety of alternative ‘metric’ module functions exist, including the mean squared logarithmic error function, median absolute error, and R-square score. All of these are discussed in great detail in the following source. I would recommend this resource as it examines thoroughly not only the mathematical background associated with these functions, but also the code needed for executing this functionality. Definitely consider giving it a preview here.
SciKit-Learn Train Test Split Function
According to our primary source for the SciKit-Learn model selection module, Learning the parameters of a prediction function and testing it on the same data is a methodological mistake: a model that would just repeat the labels of the samples that it has just seen would have a perfect score but would fail to predict anything useful on yet-unseen data. This situation is called overfitting. To avoid it, it is common practice when performing a (supervised) machine learning experiment to hold out part of the available data as a test set
X_test, y_test. Note that the word “experiment” is not intended to denote academic use only, because even in commercial settings machine learning usually starts out experimentally. If you would like to examine the train test split function in greater detail, I would refer you to this source on any SciKit-Learn related question.
Explaining Our Example
The initial example we exploited earlier may still be a bit confusing even after discussing the primary SciKit-Learn functions, so let us take a moment to pick apart the code used to create this functionality.
First, take a look at the learning curve we created. The curve for the training set begins at zero RMSE. This indicates that initially, the model fits data perfectly. As instances are progressively added to the training set, however, the model fails to fit this data perfectly as the RMSE increases. Eventually, the performance reaches a plateau, wherein new instances do not change the performance of the data set. If you’d like to understand the implications of this in greater detail, check out this source which executes similar examples in a conspicuous manner.
As we go on increasing the training set’s size (from 1 to 100), the training error continues rising. However, the validation error starts to plummet as the model performs better on the validation set. After the training size hits the 500 mark, the validation error and training error begin to converge. So, what can be inferred out of this? The performance of the model won’t change, irrespective of the size of the training post. However, if you try to add more features, it might make a difference. For comprehending this concept, an alternative source may be consulted, this one in particular.
The Take Away
In detail, we have now addressed the myriad of concepts associated with implementing learning curves for model validation in machine learning. This is an essential concept to understand, because the learning curve underlies all metrics associated with the machine learning algorithm. With the learning curve, we can infer nearly all other metrics associated with our model. In our next article, we discuss in great detail the variety of available regularized linear models and regression. However, if you would like to examine this subject matter in greater detail, I would advise you to check out this helpful resource regarding learning curves in machine learning.