An Introduction to Machine Learning Systems
This article marks the commencement on our series pertaining to machine learning and the learning mechanisms thereof. In order to make discussion of those learning mechanisms, we must first supply reason and logic to the topic at hand. Particularly, we must answer the question: “What exactly is machine learning?” This field is a component of programming which endeavors to create a system capable of learning from data and respond accordingly to new data without formal instruction. Machine learning systems take instances of training data, and use these instances to identify patterns to exploit patterns that may be used for interpreting non-training data. In this case, the machine is said, by proxy, to ‘learn’.
In addition to this question as to the nature of machine learning, it is forthcoming to answer another provoking question: “Why use machine learning as a problem solving methodology?” While the human brain is rather effective as a pattern detector, a myriad of deviations from the norm exist, as well as a variety of parameters influencing behavior beyond our view. Furthermore, various machine learning systems are capable of learning on the fly, making them quite efficient at adapting to new circumstances. This makes machine learning systems exceptional predictors when the variables are accounted for. We take the opportunity presented here to discuss the different types of machine learning systems available for design, on a rudimentary level.
Types of Machine Learning Systems
The types of machine learning systems may be categorized on the basis of three fundamental parameters:
- System trained with human intervention or not?
- System learns in batches or on the fly?
- Compare new data to training data or identify patterns to build a model?
Supervised Machine Learning
What is Supervised Machine Learning
The basis for supervised machine learning arises as an answer to our first question, pertaining to as to whether or not the machine learns by human intervention or not. With supervised machine learning, the training data is delivered to the machine with information supplemented with labels. Thus, not only is the machine given the input, but also the solution.
Supervised Machine Learning Tasks
A variety of different machine learning tasks may be accomplished by supervised machine learning. One of such techniques includes classification. With respect to machine learning classification, when the system is trained on label data, the machine learns to group a particular object into different subsets. These groups may be specified by the labels, or can be specified by the creator.
Another task of supervised machine learning may require a system to predict a target numerical value for a particular object based off of the numerical values of known objects with a specific set of attributes. This methodology of prediction is known as regression, as the target value is computed regressively by looking at known values.
According to an article by Machine Learning Mastery, the typical supervised machine learning system may be mathematically modeled by a computational mapping algorithm which contends that there is a functional relationship ‘f’ between the input variables ‘x’ and the solution variable ‘y’. Such a mathematical model appears as:
Note here that ‘f’ serves both as a machine learning algorithm, but also represents the mathematical understanding of a function, such that a relationship is defined linking the input to the output. The machine learning system learns from the training data to discern the relationship between input and output. The classification tasks associated with supervised machine learning includes:
- Linear Classification
- Support Vector Machines (SVMs)
- Decision Trees
- K-Nearest Neighbor
- Random Forest
Compared to classification, regression is a slightly more sophisticated methodology. This is a consequence of the fact that the relationship can not be so easily represented as ‘f’.
We noted previously that the purpose of regression in machine learning is the eventual prediction of a particular object from the input. This contention is validated by an article from Towards Data Science, which contends that “the goal of [the] regression algorithm is to predict a continuous number.”
A previous series on linear algebra discussed computations of linear systems and linear combinations. For greater insight into this subject, you may wish to consult our discussion on Gaussian-Jordan reduction, as this mathematical concept forms one of the fundamental features of regression. Nevertheless, we may computationally model the function of linear regression as:
By this function, ‘x[i]’ represents the features of the data, which can be inferred as the data in conjunction with its associated attributes. Additionally, ‘w[i]’ and ‘b’ are said to be parameters of the data, which are only inferred in training. By simplifying this function, we observe a linear relationship of the form:
This relationship constitutes the method of regression applied to supervised machine learning models. Methods of linear regression include:
- Linear Regression
- Polynomial Regression
- Quadratic Regression
- Logistic Regression
- Symlog Regression
Unsupervised Machine Learning
What is Unsupervised Machine Learning
Unsupervised machine learning is also a machine learning system which responds to our question of whether intervention is involved in training the model. With unsupervised machine learning, the training data happens to be unlabeled. In this case, the system attempts to learn without having explicit instructions, but rather, seeks to identify its own instructions.
Unsupervised Machine Learning Tasks
Because unsupervised machine learning systems are not bogged down by additional input data (specifically the labels), there a variety of efficient machine learning tasks which can be executed with this methodology. One of these is clustering, which involves a machine learning system that divides data into various groups based on parameters associated with the data. Clustering algorithms can go so far as dividing these groups into their own subgroups for enhanced precision, especially when working with large data sets.
Unsupervised machine learning also supports execution of visualization tasks which help create visible models based on unlabeled data. The output can be either a two-dimensional or three-dimensional model of the data in space.
When working with unlabeled data, some parameters on their own are irrelevant, giving rise to a problem that revolves around excessive data. To solve thus problem, we can execute a process of dimensionality reduction wherein the data is simplified by merging multiple parameters of the data into one comprehensive attribute.
Using this technique of dimensionality reduction, we can undertake another unsupervised learning task known as anomaly detection. Under this guise, we can identify outliers from a data set. We determine this by training the data to identify normal behavior of objects belonging to a dataset, and objects that do not adhere to this norm are considered anomalies.
Finally, with respect to unsupervised learning, associative learning is one particular task that may prove to be especially useful. This methodology takes multi-dimensional data as input and identifies relationships between various parameters of the data.
We previously described the clustering algorithm as an unsupervised machine learning methodology that utilizes the parameters associated with particular data entries to identify groups and subgroups in the data. Clustering supports several different algorithms that can achieve this end, though these will be expanded on thoroughly in a subsequent article. Nevertheless, the primary algorithms include
- K-Means Clustering
- Hierarchical Clustering Analysis (HCA)
- Expectation Maximization
In an article by George Seif discussing a variety of clustering algorithms in Towards Data Science, he validates our conviction of clustering, arguing that “data points that are in the same group should have similar properties.” By proxy of this well formulated argument, clustering algorithms permits exploitation of these relationships to develop closely unified groups and subgroups. These various algorithms achieve these ends by different methods which favor slightly different goals. These will be addressed in a subsequent article.
Anomaly detection is perhaps one of the simplest algorithms we can employ by unsupervised learning. This is due to the fact that, in many cases, especially with large data sets, it is much easier to identify differences than to identify similarities. One reason for its simplicity is that the anomaly can readily be identified based on some rather straightforward statistical concepts such as standard deviation or the correlation coefficient. Susan Li, another individual writing for Towards Data Science, presents a variety of anomaly detection methodologies in one of her myriad of articles.
In this article, she exploits the fundamental premises of anomaly detection, particularly the fact that anomalies are rare and ought be readily identifiable from a data set.
We may extrapolate a bit on this conviction, with an additional presupposition that if a human is able to readily able to identify outliers in a set, then a machine learning system ought be capable of doing so with a much greater degree of specificity. While the anomaly detection algorithms differ in their means by which they identify anomalies, rest assure these revolve around exploiting significant parameter differences. The various anomaly detection methods will be addressed in depth in a later article.
What is Batch Learning
As we move on from supervised and unsupervised learning, batch learning provides us with an avenue through which we may answer the second question. With regards to these fundamental premises, batch learning is of particular import as these systems learn on the fly. Batch learning systems are incapable of learning incrementally as they must be trained with all of the data at once. The system is trained at once and is subsequently launched. The system then applies what it has learned with new data. This may also be understood about offline learning. Furthermore, teaching the system with new data requires a new version of the system to be created.
Batch Reinforcement Learning
Batch learning can actually apply to a type of learning known as reinforcement learning. With reinforcement learning, an agent within the program is capable of observing the data-construted environment and perform actions. These actions generally return rewards which either instruct an agent to repeat the action or cease the action.
Adding batch learning to reinforcement learning solves a particular problem of the latter consequence of the fact that teaching an agent can take too much time if the system is overly complex. By providing the agent with large aggregates of data, such an issue may be solved.
What is Online Learning?
Along this thread of whether or not a system is capable of learning incrementally, online learning permits this exceptionally. In online learning, a learning system uses mini-batches of data sequentially such that the system can constantly improve its methodology. One of the advantages of this regimen is the fact that it is quite quick. Furthermore, it does not require a large initial training set.
Online Learning Tasks
The online learning method widely populates the machine learning ecosystem. One of its primary uses is when working with a system that is dynamic, which is to say that it receives a constant flow of information. For this reason, the machine learning model must be adept in adapting to a constantly changing data environment in order to derive a proper solution.
It is also particularly applicable for instances wherein there is insufficient memory to store training data. In this case, when the data content is beyond the storing capabilities of the computer memory, online learning can be employed to teach the model incrementally so that the data does not take up such large amounts of space.
Controlling the Learning Rate
When working with an online learning model, it is important to establish the learning rate of the system. According to Jason Brownlee in his comprehensive article at Machine Learning Mastery, he describes the learning rate of a machine learning system as a quasi thermostat such that the rate of model change in response to error can be controlled. Establishing a rapid learning rate facilitates rapid adaptation, allowing the system to quickly account for changes as new data enters the system. Alternatively, low learning rates exhibit longer durations of adaptation bur are unlikely to forget various parameters of the data. Which one you employ depends on the circumstances of the data and environment you operate in.
Instance Based Learning
What is Instance Based Learning?
Finally, we arrive at machine learning systems that answer the final question of machine learning models. This being whether or not the system derives solutions by comparing new data to known data or if the solution predicts from a model. Instance based learning answers this question employing a methodology that derives solutions by comparing a data instance to the known training data. Based on the instance data’s relationship to the known data, the instance data may be extrapolated.
Tasks of Instance Based Learning
Conducting an instance based learning model relies on the system operating to a degree by a measure of similarity. More specifically, it examines the degree of similarity between the instance data and the known data. The system uses known data and generalizes to a certain degree in order to derive parameters of the new data.
According to Keogh in an article on instance based learning, the author contends that instance based learning relies both on classification and regression to predict the labels that would be associated with the new data. This judgement employs the training data set. This methodology is rather efficient as all it must do is store data and at run-time compare the new data to the training data and identify the nearest neighbor.
Model Based Learning
What is Model Based Learning
The model based learning methodology answers the third question a bit differently. Rather than comparing new data to known data, the model based learning method uses the known data to create a model, and this model predicts the value of the new data instance. This makes model based learning a consequence of prediction rather than a matter of identifying a nearest neighbor.
Summarize the Various Machine Learning Models
This article extrapolated on the six primary machine learning models:
- Supervised Learning
- Unsupervised Learning
- Batch Learning
- Online Learning
- Instance Based Learning
- Model Based Learning
As we have addressed presently, these systems operate in widely different fashions and rely on significantly different algorithms. Nevertheless, the entire goal of all the systems remain the same. This is to execute the underlying premise of machine learning. This is to use known data or incoming data to predict or return the values of new data.
Subsequent articles intend to explore this subject matter in great depth, so if questions remain, consider consulting the embedded sources. Furthermore, machine learning models rely heavily on various aspects of computer science.
A machine learning developer would fail without prior background in MatPlotLib or Pandas, and would likely also fail in understanding the mechanisms behind the algorithms. For that reason, we highly recommend you consult the following articles for insights to these concepts:
- Partial Derivatives:
- Vector Calculus:
- Pandas Tutorials:
- Linear Algebra: