Introduction to Machine Learning Resources
Recently, we have received a large amount of requests for useful resources in attempting to understand machine learning from first principles. These requests inspired the creation of this article. Just so we can set the scene for you, we want to take a moment to reflect on the topics of discussion made throughout this machine learning series. Much of our investigative efforts have centered on one of two topics: actual machine learning models and performance metrics of the models.
With respect to the various machine learning models investigated, several articles have been published. We began first with an overview of the primary machine learning models and algorithms frequently utilized. This was immediately followed by a series of four articles expanding on this, including devotion to supervised learning models, unsupervised learning models, a comparison of batch and online learning, and instance vs model based learning. We implemented all of this knowledge into two succinct examples. The first addressed classification algorithms on the MNIST data set, as well as multi-class classification.
Having discussed the intricacies of machine learning models themselves, we further extrapolated on the metrics which allow us to quantify performance of our models and algorithms. These discussions of performance measures revolve around techniques of cross-validation, the confusion matrix, distinctions between precision and recall, as well as the utility of the ROC curve. All of these are readily available as analytical measures of our machine learning models. Here, we move towards more theoretical discussions and applicability of fundamental algorithms. In this case, let us explore perhaps the most ubiquitous machine learning algorithm, the stochastic gradient descent algorithm.
Resource #1: Hands-On Machine Learning With SciKit-Learn and TensorFlow
By far the best resource I have used to advance my knowledge of machine learning and data science is this book called “Hands-On Machine Learning With SciKit-Learn and TensorFlow.” This book was written by Aurelien Geron and deals with the most important facets associated with machine learning. Without doubt, I have used this book most extensively over all other resources. You can find a link and image of it below:
In addition to given a thorough extrapolation on the fundamentals of machine learning, it also exercises great detail in discussing more complex subject matters, including convolutional neural networks and deep neural networks. This book also provides deep insights into the mathematics which underly the important features of machine learning, combined with extensive code annotations which demonstrate how to execute this in a program.
I really can’t recommend this book enough. If there was only one resource you could have to learn machine learning from, this would definitely be the one. I attribute my becoming a full-time data scientist to this book, as it brought to light so many nuances of machine learning which I have not found in other resources. If this is something you are interested in as well, check it out here.
Resource #2: Coursera Machine Learning By Stanford
Now, I have a lot of apprehensions against using websites like Coursera for acquiring computer science and programming knowledge. First of all, while they do have free options available, they often attach their ‘certifications’ to a pretty hefty monetary price, even though the free option gives the same information. This seems to be intellectually disingenuous, and their certifications pretty much mean diddly squat in the work place.
There are plenty of alternative resources out there that don’t double as a status symbol and will in all honesty give you a much better insight. Nevertheless, the Coursera Machine Learning By Stanford was quite helpful. I did not use their video lectures as I found their reading materials to be sufficient for my purposes. I recommend it to you not so you go and throw your entire wallet at this corporation, but because they do offer several options that help you get started with machine learning. If you’re looking for upper level material however, this is not the place to go. Nevertheless, if the free beginners reading material piques your interest, you can find it here.
Resource #3: The Data Science Handbook
I’m glad to leave that Coursera stuff behind; it’s funny, even though programming revolves around computers, when it comes to learning, I much prefer books to digital videos and lectures. One of my favorite books for learning data science is definitely The Data Science Handbook. This book is much like the Bible (or insert your preferred religious text) when it comes to programming and data science.
What I found to be most helpful from this text was its explanation regarding the four most useful external libraries in all of data science. These libraries include Pandas, Numpy, MatPlotLib, and SciKit-Learn. Pandas and Numpy in themselves are absolute essentials when working with any aggregate of data, as they support efficient storage and mathematical computation with a variety of different data types. MatPlotLib is the canonical library used for graphical models of data, while SciKitLearn is really the essential machine learning library.
All of these libraries are explored in great detail. Nearly all of the functions for each of these libraries are provided for, in addition to the code needed to use them, in conjunction with helpful examples for implementing their functionality. I found this book to be perhaps the most practical out of all the resources, and I still refer to it to this day.
I really am constantly learning from this tool, and in consideration for how cheap it is, I definitely recommend you take a look at its contents. Knowing the topics it discusses not only will improve your programming skills but will also help you ace interviews for landing your first job as a data scientist. You can find the discounted version by following this link.
Resource #4: Machine Learning With R
As my machine learning algorithms became significantly more complex and reliant on sophisticated mathematical models, my use of Python began to dwindle. Python simply is just not powerful enough to support the modeling of hyper-complex data, especially data derived from biological models. However, even with a background in Python, R is not so much of an intuitive language.
Therefore, it was quite important for me to not only learn the language, but also learn how to use the language for machine learning. The Machine Learning With R book is one of the best resources I could recommend to boost the diversity of knowledge you have in your programming tool kit. Not only will it teach you how to use R for machine learning, but even if you don’t already have a background in R, it will teach you the syntax of the language along the way.
This book gives you a lot of bang for your buck, especially because the paperback version runs for around $20, and has so much potential to raise your salary beyond imaginable. A myriad of studies of data scientists show that data scientists with fluency in both Python and R have significantly higher salaries than those with just one or the other. I highly recommend it, and you can find it by following the link here.
Resource #5: Genetic Algorithms With Python
Is it your dream to get a job as a programmer or data scientist? Would you enjoy working in a researhc lab in the field of bioinformatics or biology? I don’t blame you. I personally work in bioinformatics, and know that this is a highly coveted position to be in. I would not be able to do this without a thorough understanding of biological components, and how to model these components with machine learning and data science models.
Fortunately, I majored in Biochemistry and Microbiology in college. However, little time was devoted in my courses towards the methods of biological data modeling. Fortunately, this book, Genetic Algorithms With Python taught me everything I needed to know in this regard. I know without a doubt in my mind that without this resource, I would not be able to work in the field of bioinformatic data modeling.
So many nuanced subjects were extrapolated on in this book that I had not previously discovered. Furthermore, it is quite cheap. Therefore, if you want to go into bioinformatics and data science, it would be well worth your time and money to read this book cover to cover. You won’t be bored, you have my word. You can find the discounted version of the book at this location.
The Take Away
If you’ve come this far, you may not be satisfied with the offers listed above. Fortunately, the Art of Better Programming offers premium content to members of our institution. In particular, we offer a 300 page PDF file filled with the important knowledge needed for beginning a career in data science. If you’re interested in getting access to this PDF, fill out your order. Out of all the resources here, it is by far the cheapest option available. By following the button below you will acquire access to all of this premium content.