Linear Regression Models: Topics of Machine Learning

An Introduction to Linear Regression

The present article endeavors to explore the intricacies of linear regression models in machine learning. Before embarking on this discussion, we would like to provide a brief overview on the analyses explored in the Topics of Machine Learning series.

Much of our investigative efforts have centered on one of two topics: actual machine learning models and performance metrics of the models. With respect to the various machine learning models investigated, several articles have been published. We began first with an overview of the primary machine learning models and algorithms frequently utilized. This was immediately followed by a series of four articles expanding on this, including devotion to supervised learning models, unsupervised learning models, a comparison of batch and online learning, and instance vs model based learning. We implemented all of this knowledge into two succinct examples. The first addressed classification algorithms on the MNIST data set, as well as multi-class classification.

9.6 - Interactions Between Quantitative Predictors | STAT 501

Having discussed the intricacies of machine learning models themselves, we further extrapolated on the metrics which allow us to quantify performance of our models and algorithms. These discussions of performance measures revolve around techniques of cross-validation, the confusion matrix, distinctions between precision and recall, as well as the utility of the ROC curve. All of these are readily available as analytical measures of our machine learning models. Here, we move towards more theoretical discussions and applicability of fundamental algorithms. In this case, let us explore perhaps the most ubiquitous machine learning algorithm, the linear regression model.

Before We Get Started

I want to let you all in on a secret before we proceed with the modeling of machine learning derived data. Sometimes one of the trickiest things to do when working in machine learning is that you sometimes need to kind of create your own programming language. While we plan on coming out with a series on this in the future, I myself learned from the folks at ProLang. They offer an awesome book that goes into the details of creating language features that best support machine learning. Follow the link provided here. This will take you to ProLang where you can sign up and get their book. Getting this book was the best decision I ever made, and it drastically improved my coding experience. All you need to do is click the link and order a copy of the book. They even let you get it for free. Fill out an order and begin creating your own sophisticated machine learning models today.

Conceptualizing Linear Regression

Overview

A linear regression model may be be trained by two separate methodologies. Firstly, we may use a closed for function which computes the model parameters that yield the most optimal fit. Furthermore, the closed form function operates by minimizing the cost function as parameter values are modulated. We can also use iterative methodologies, like the Gradient Descent (GD) approach. This model gradually alters the parameter values progressively, also endeavoring to minimize the cost function as parameters are tweaked.

Linear Regression Models

We first begin by examining closed form function for the training procedure of the machine learning model. Closed form functions look like a linear function which defines the prediction of an object’s parameter value based on the value of another parameter. Closed form linear regression algorithms are essentially procured by variations of linear combinations. Take a look at the following closed-form linear regression model:

\hat{y}=\theta_0+\theta_1x_1+\theta_2x_2+\cdot\cdot\cdot+\theta_nx_n

Here, ‘y’ represents the parameter of the algorithm to be predicted. All of the parameters of the data object are the entirety of x’s from x1 to xn, which are known, as well as the ‘y’ parameter, which is the single unknown. The values for theta are weights assigned to the different known parameters of the data object, where some stipulate a greater significance to determining the value of ‘y’ than others.

Because a given parameter has a series of known parameters represented by some variable ‘x’, we can specify these known parameters by a vector x. Furthermore, each parameter of ‘x’ is associated with a weight ‘θ‘, and thus we may model the thetas in their own vector θT. If the vector x has the form:

x=[x_1,x_2,…,x_n]

and the vector θT is modeled as:

\theta^T=[\theta_0,\theta_1,\theta_2,…,\theta_n]

then a vectorized version of the closed form linear regression algorithm may be displayed as:

\hat{y}=h_{\theta}(\bold{x})=\theta^T\cdot\bold{x}

Theta and ‘x’ here are both vectors, so the parameter value ‘y’ is computed by the dot product. If you need a refresher on this computation, check out our article which discusses this subject specifically.

Cost Functions

For training a machine learning linear regression model, our primary endeavor is setting parameters that create a best fit algorithm. In order to do this, one obstacle is that we need a means by which we can quantify how well our model performs. One of the most common means of quantifying the performance of the linear regression model is through the use of the root mean squared error cost function. In this manner, the parameter theta is found which minimizes the root mean squared error. However, a much simpler mechanism is utilizing the Mean Squared Error (MSE). The Mean Squared Error function may be modeled as follows:

MSE(\bold{X},h_\theta)=\frac{1}{m}\sum_{i=1}^m(\theta^T\cdot\bold{x}^{(i)}-y^{(i)})^2

Therefore, for every parameter theta and ‘x’, including the value of ‘y’, we seek to minimize the MSE over the entire data set.

In Supervised Machine Learning

When supervised machine learning is implemented, according to Kavitha and Varuna, “The training data is mapped to new value from the input data and produce result.” In this manner, “The training dataset consist number of tuples (T). Each tuple is a vector which contains attribute values. The target data can have more than one possible outcome or a continuous value.”

With classification algorithms, the value of an unknown parameter in a data object is predicted based on its belonging to a particular category with its other parameters. As Kavitha and Varuna argue, “[Classifiers] can be built using the classification algorithm from the training data set. Each
tuple consists of training data set with the class label. Then the test data is used to evaluate the classification rules using the classifier.”

Regression provides alternative opportunities. Regression is a statistical analysis method to identify the relationship between the variables. The relationship can be identified between the dependent and independent variables. As contended by Kavitha and Varuna, “The variable dependency can be either univariate or multivariate regression. Univariate regression identifies the dependency among single variable.”

Training the Linear Regression Model

The previous section demonstrated the necessity of identifying the value of theta which minimizes the magnitude of the MSE cost function. We do in fact have a function which permits this computation autonomously, known as the Normal equation. The Normal equation may be modeled as follows:

\hat{\theta}=(\bold{X}^T\cdot\bold{X})^{-1}\cdot \bold{X}^T\cdot \bold{y}

We can attempt to test this linear model with some linearized data. First, import Numpy into your script and create some randomized data using the random.rand functions. Take a look at the code we use to generate this data:

Quite easily, when executed, this code yields the following plot:

With the normal function, we attempt to compute the parameter value ‘theta’. Numpy offers several functions that assist in the computation of the normal equation. For example, we may use the inverse function from Numpy’s linear algebra module to compute the inverse of the matrix, as well as the dot function to compute the dot product of two vectors. Let’s take a look at how this manifests in code:

When we run this code, we receive a matrix that stores the optimal values of the theta parameters. It looks like this:

Once we have these values, we can make predictions of ‘y’ using these optimized parameters. Let’s take a look at the code we use for making these predictions:

Thus, when we run this code, we obtain a plot that appears as:

Now, with this linear model, we can make quite an effective prediction of new data objects for whatever the values of its ‘x’ parameters.

The Take Away

Linear regression is an essential methodology to training machine learning models. This method provides a highly mathematical means of achieving accurate prediction of modeling data objects. A variety of means for executing linear regression are available. These will be extrapolated on in great depth in subsequent articles. Our first foray into this subject matter will be an analysis of gradient descent models. We look forward to seeing you there.

Leave a Reply

%d bloggers like this: