Introduction to Polynomial Regression
The present article seeks to elucidate the myriad of features associated with polynomial regression with respect to its role in machine learning. We previously brought to light a variety of features associated with linear regression in machine learning models. We first provided an introduction to linear regression in this article. Following this, we took an opportunity to examine three implementations of the linear regression models, investigating gradient descent, batch gradient descent, and stochastic gradient descent.
Much of our investigative efforts have centered on one of two topics: actual machine learning models and performance metrics of the models. With respect to the various machine learning models investigated, several articles have been published. We began first with an overview of the primary machine learning models and algorithms frequently utilized. This was immediately followed by a series of four articles expanding on this, including devotion to supervised learning models, unsupervised learning models, a comparison of batch and online learning, and instance vs model based learning. We implemented all of this knowledge into two succinct examples. The first addressed classification algorithms on the MNIST data set, as well as multi-class classification.
Having discussed the intricacies of machine learning models themselves, we further extrapolated on the metrics which allow us to quantify performance of our models and algorithms. These discussions of performance measures revolve around techniques of cross-validation, the confusion matrix, distinctions between precision and recall, as well as the utility of the ROC curve. All of these are readily available as analytical measures of our machine learning models. Here, however, we return to our regression models, focusing in particular on polynomial regression algorithms rather than linear regression algorithms.
Creating Polynomial Regressions
Polynomial regression can be made much simpler by creating the code yourself rather than using some imported library like SciKitLearn. Knowing how to do this yourself will give you a lot more autonomy over your machine learning model. While we plan on coming out with a series on this in the future, I myself learned from the folks at ProLang. They offer an awesome book that goes into the details of creating language features that best support machine learning. Follow the link provided here. This will take you to ProLang where you can sign up and get their book. Getting this book was the best decision I ever made, and it drastically improved my coding experience. I highly recommend it to you.
Conceptualizing Polynomial Regression
In our previous discussion of linear regression, we operated on data that revolved around variables that were linearly correlated. Here, however, we investigate the modeling of more sophisticated data which is non-linearly related. In particular, this data behaves according to polynomial-derived relationships.
With polynomial data, powers of a particular order are attributed to the features of various data objects. The machine learning model then trains a linear model on this extended set of features. In this manner, we execute polynomial regression.
So, when there is a non-linear relationship between essential variables, we may utilize the polynomial regression model. This can be modeled in the following fashion:
In this model, we utilize a single predictor represented by the variable ‘X’, and ‘h’ represents the degree of the polynomial. The amount of terms involved in the polynomial is a reflection of it’s degree. If ‘h’ is two, then the polynomial is a quadratic. If ‘h’ is three, then the polynomial is a cubic, etc.
According to Penn State University, In order to estimate the equation above, we would only need the response variable (Y) and the predictor variable (X). However, polynomial regression models may have other predictor variables in them as well, which could lead to interaction terms. So as you can see, the basic equation for a polynomial regression model above is a relatively simple model, but you can imagine how the model can grow depending on your situation.
Guidelines for Polynomial Regression
Some general guidelines to keep in mind when estimating a polynomial regression model are:
- The fitted model is more reliable when it is built on a larger sample size n.
- Do not extrapolate beyond the limits of your observed values, particularly when the polynomial function has a pronounced curve such that an extraploation produces meaningless results beyond the scope of the model.
- Consider how large the size of the predictor(s) will be when incorporating higher degree terms as this may cause numerical overflow for the statistical software being used.
- Do not go strictly by low p-values to incorporate a higher degree term, but rather just use these to support your model only if the resulting residual plots looks reasonable. This is an example of a situation where you need to determine “practical significance” versus “statistical significance”.
- In general, as is standard practice throughout regression modeling, your models should adhere to the hierarchy principle, which says that if your model includes Xh and Xh is shown to be a statistically significant predictor of Y, then your model should also include each Xj for all j<h, whether or not the coefficients for these lower-order terms are significant. In other words, when fitting polynomial regression functions, fit a higher-order model and then explore whether a lower-order (simpler) model is adequate. For example, suppose we formulate the following cubic polynomial regression function:
Then, to see if the simpler first order model (a “straight line”) is adequate in describing the trend in the data, we could test the null hypothesis:
But then … if a polynomial term of a given order is retained, then all related lower-order terms are also retained. That is, if a quadratic term (x2) is deemed significant, then it is standard practice to use this regression function:
and not this one:
whether or not the linear term (x) is significant. That is, we always fit the terms of a polynomial model in a hierarchical manner.
Encoding a Polynomial Regression Model
In our Python program, we can easily populate our script with data that behaves according to a cubic equation. Recall that cubic equations have a degree of three, such that in our polynomial model, h=3. Take a look at how me might model this:
Here, we are modeling a polynomial cubic that takes the following form:
When we plot the function as a scatter plot, we obtain the following figure:
All we have to do to create this polynomial plot is input and execute the following code:
Fitting Machine Learning Data
The degree of the polynomial curve being higher than needed for an exact fit is undesirable for all the reasons listed previously for high order polynomials, but also leads to a case where there are an infinite number of solutions. For example, a first degree polynomial (a line) constrained by only a single point, instead of the usual two, would give an infinite number of solutions. This brings up the problem of how to compare and choose just one solution, which can be a problem for software and for humans, as well. For this reason, it is usually best to choose as low a degree as possible for an exact match on all constraints, and perhaps an even lower degree, if an approximate fit is acceptable.
The Take Away
The present article discussed in detail the features of machine learning models based on polynomial regression algorithms. We discussed in detail the mathematical features thereof, and furthermore, the code needed to develop these models. In our next article, we intend to present in great detail the features of learning curves in machine learning models. We look forward to seeing you there. If you seek to examine this subject in more detail, however, definitely consider checking out this resource.