Transition From Python To R: Using R in Data Science and ML

Advertisements

An Introduction To R

The Art of Better Programming has given an extensive discussion of machine learning and data science from the perspective of Python. Studies show that Python is the most represented programming language in data science, exhibiting a use in over 50% of cases. However, if you want to be an employable data scientist, knowledge of R can set you apart from your competition. However, there are quite a few differences in R from Python that can make learning R a bit difficult. Nevertheless, knowledge of R is a necessity simply as a consequence of its power in statistical analysis. Before we get started, I’d like to recommend to you a tool we will consult frequently as we consider the nuances of R. This book presents the essential features of R as they relate to R, providing tutorials, examples, and significant amounts of code in demonstrating the implications of using R in statistical analysis and machine learning. Check it out here.

Advertisements

This first article in our new series on programming with R gives an introduction to R programming before diving into the complexities of R’s implementation in data science and machine learning. If you’d like to follow along with one of the best resources available for learning R with respect to data science, check out the following class. I have used it extensively myself, it has great reviews, and over twenty hours of classes. Furthermore, I have acquired access to it to give it to all of you for over 97% off. Thus its quite cheap and can give you the edge in your data science pursuits. I highly recommend you check it out here at reduced cost and great benefit to you.

Conceptualizing Programming With R

Overview of R

As you might find in Machine Learning With R, R is an interpreted language rather than a compiled language, which means that typed commands are directly executed without needing a complete program. This language was developed specifically with powerful statistics as its motive, and thus functions like regression, distributions, and test statistics are easily accessible to the programmer. In a manner similar to Python, function arguments are always placed within parentheses, though these parentheses may be left empty. On the basis of its design, the mechanisms and mannerisms of R appear quite similar to Python.

Advertisements
Data Objects in R

As with Python, data is stored in the script memory as an object, which is denoted by a particular name. Furthermore, as in Python, it is quite simple to perform various mathematical operations and functions on these objects. Thus, like Python, R may be thought of as an object oriented language. However, there are some important differences in R’s objects compared to Python.

As a specifically mathematical language, R has some extra object types which are innate to the language, including factors, vectors, matrices, and data frames. These object types will be investigated later. Nevertheless, if you’re looking for a thorough discussion of these object types, check out the following resource which investigates this subject in great detail.

Advertisements

Creating Objects in R

As a novel programming language, R possesses its own unique syntax which may seem confusing if you’re coming from a Python background. Perhaps the most significant difference is the method of object creation. In Python, we are used to creating an object using the ‘=’ operator, which establishes a relationship between a variable and its value which creates a new namespace.

In R, however, the ‘=’ operator is exchanged for the ‘<=’ operator. It executes the same functionality, simply with a different form. As in Python, this symbol assigns a value to a variable and ultimately creates an object. Furthermore, if the object already exists, then the assign operator erases the previous value and stores a new value to the object. Take a look at some code that demonstrates how we may create objects in R:

> x <- 15 #Create an object 'x' stored with value '15' 
> x       #Run the object 'x' 
[1] 15    #Returns the value associated with 'x' as '15'
Advertisements

The code above shows how we create an object ‘x’ using the ‘<=’ operator. We can duplicate the following code and create a new value for ‘x’, and we will find that the previous value of ‘x’ was erased in favor of the new value. Take a look:

> x <- 15 #Create an object 'x' stored with value '15' 
> x       #Run the object 'x' 
[1] 15    #Returns the value associated with 'x' as '15'
> x <- -2 #Reassign 'x' to have a value of '-2'
> x       #Run the object 'x'
[1] -2    #Return the new value of 'x', '-2', not '15'

In addition to previous object erasure, another unique feature of R is the fact that we can reverse the assign operator as ‘->’. This syntax appears in the following manner:

> 15 -> x #Create object 'x' and assign it value '15' reversed
> x       #Run the object 'x' 
[1] 15    #Returns the value of 'x' which is '15
> -2 -> x #Assign a new value, '-2' to 'x' in reverse
> x       #Run the object 'x' 
[1] -2    #Returns the value of 'x' which is '-2' 
Advertisements

The previous examples demonstrated creation of objects with the numeric type. However, we can create objects of other types as well. The cognate to Python’s string object is represented by R’s character object. The character object is denoted by values which are embedded in single or double quote symbols. Take a look at the code that produces the character (string) object:

> name1 <- 'David' 
> name2 <- 'Angie' 
> name1 
[1] "David"
> name2 
[1] "Angie" 

Verifying Objects in Memory

One of the unique features of R which is also quite useful is our ability to list all of the objects in the script memory. This can be done with the ‘ls()’ function, which explicitly lists all of the objects found in the memory of our script. Take a look at how we might use this:

> ls() 
[1] "x" "name1" "name2"

Note that with the ‘ls()’ function, we return the names of the objects, not the values associated with the objects.

Advertisements

Another cool feature of R is the fact that we can include multiple commands on the same line, simply by separating the commands with semi-colons. This can be helpful for condensing mundane information on the same line. Take a look at how we might encode these features and list them in R:

> name3 <- 'Elliot'; y <- 42; z <- 3; on <- 8
> ls()
[1] "x" "name1" "name2" "name3" "y" "z" "on"

We can assert even greater control over the objects returned by the ‘ls()’ function. This may be done by utilizing the ‘pat’ attribute which stands for pattern. This allows us to identify a specific feature of the object which we are searching for. For example, if we only wanted to find objects that have an “n” in their label, we can set the ‘pat’ argument equal to ‘n’. Take a look at how we execute this in R:

>ls(pat="n") 
[1] "name1" "name2" "name3" "on" 
Advertisements

Furthermore, we can specifically target items that start with the letter ‘n’ by passing in the “^n” argument to the ‘pat’ attribute. Take a look at how we encode this feature:

>ls(pat="^n") 
[1] "name1" "name2" "name3" 

Identifying Object Features In R

In addition to using the ‘ls()’ function to list the different objects in script memory, we can employ the ‘ls.str()’ method, which lists the object in memory, their value, as well as the type of the argument. For example, take a look at how we do this in code:

> ls.str() 
x : num -2
name1 : chr "David"
name2 : chr "Angie" 
name3 : chr "Elliot"
y : num 42
z : num 3
on : num 8
Advertisements

As you may see in Zumel and Mount’s landmark publication on Data Science using R, in addition to value and name of objects, objects are also defined according to their attributes. If you have not already taken a look at their discussion on the features of data science in R, I highly recommend you take a look at this. Nevertheless, here you will find analyses of two particular attributes associated with objects in R, including mode and length.

According to the following source, the mode reflects the type of the elements which constitute the object, and length reflects the number of elements associated with the object. Both of these attributes can be accessed and identified using the mode and length functions. Take a look how we manifest the attributes of these objects within code:

> x <- 15 
> name1 <- 'David' 
> mode(x) 
[1] "numeric" 
> mode(name1) 
[1] "character" 
> length(x)
[1] 2 
> length(name1) 
[1] 5 

No matter what the mode of the object is, if the data is not present within the object, the mode of the object will be represented symbolically as NA.

Advertisements

Creating Vectors in R

Hopefully to date, you have been following along with the Art of Better Programming, and thus you have had a thorough background in the nature and implementation of vectors in a variety of capacities. Nevertheless, R programming makes it quite simple to encode vectors, both numerically and textually. For example, an apple has three primary attributes, ‘red’, ’round’, and ‘fruit’. Thus, we can model the nature of an apple with a vector comprised of three attributes. Take a look at how we may code this vector in R:

> Apple <- c('red', 'round', 'fruit') #Define a vector with function 'c()'
> print(Apple)                        #Return attributes of Apple
> print(class(Apple))                 #Return mode of the Apple vector
[1] "red" "round" "fruit" 
[1] "character" 

Here, we utilize the function c() to create a vector in R. The vector is a type of mathematical list which can provide insight to the dimensions that constitute a particular object. Vectors are unique in that they only exhibit one type of object within them.

Advertisements

Creating List Objects in R

As in Python, the list object stores different objects in a particular sequence. The feature that differentiates the list in R to the vectors is the fact that the list stores a myriad of objects of different types. We can create a list object using the function ‘list()’, and subsequently including objects of various different types within, including vectors, numbers, characters, and factors. Take a look at how we might create a variety of different lists using the code below:

> list1 <- list(c(2,5,3), 21.3, sin, "David") 
> print(list1) 
[[1]]
[1] 2 5 3

[[2]] 
[1] 21.3 

[[3]]
[1] function (x) .Primitive("sin") 

[[4]] 
[1] "David" 

Evidently, when printing the list object, we receive an individual output for each separate object within the list.

Advertisements

Creating Matrices in R

Vectors, lists, and arrays in R generally represent one dimensional objects with a defined length. Matrices differ from these objects as they may take on multiple dimensions. These objects in R can be conceptualized as a table, having multiple rows and columns. We employ the matrix function to create the matrix object. Take a look at how we create a matrix with the following code:

> Matrix_A <- matrix(c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C')
                     nrow=3, ncol=3, byrow=TRUE) 
> print(Matrix_A) 
     [,1] [,2] [,3]
[1,] "A"  "A"  "A"
[2,] "B"  "B"  "B"
[3,] "C"  "C"  "C"     
Advertisements

We utilize the nrow and ncol attributes to specify the number of rows and number of columns in the matrix, respectively. Furthermore, we use the byrow attribute and set it equal to TRUE to specify the data will be placed into a row and once filled will move on to the next row. If we set this parameter equal to FALSE, we obtain a matrix of a slightly different conformation:

> Matrix_B <- matrix(c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C')
                     nrow=3, ncol=3, byrow=FALSE) 
> print(Matrix_A) 
     [,1] [,2] [,3]
[1,] "A"  "B"  "C"
[2,] "A"  "B"  "C"
[3,] "A"  "B"  "C" 
Advertisements

Creating Time Series in R

Time series are unique objects that are not commonly seen in Python, and have some quite effective applications in a variety of coding contexts. The function ts creates an object of class “ts” from a vector (single time-series) or a matrix (multivariate time-series), and some operations associated with time series. Take a look at the various attributes associated with a time series object:

ts(data = NA, start = 1, end = numeric(0), frequency = 1,
   deltat = 1, ts.eps = getOption("ts.eps"), class, names)
Advertisements

The following resource provides deep insight into the features of time-series objects in R, among other objects and the language. I highly recommend it, and you can take a look at it and get a preview (and discount) here. Nevertheless, we take a moment to describe some of the important features of these attributes.

First of all, the data attribute represents a vector or matrix that fills the time series with data. The reason that a vector or matrix may function here consequences from the fact that a time series may be unidimensional or multidimensional. Additionally, the start and end attributes reflect where the time series begins and where it terminates. The frequency attribute reflects the number of observations which are made for each unit of the time series.

Advertisements

The deltat attribute reflects the fraction of the sampling period between successive observations. However, in terms of frequency and deltat, only one needs to be used. The class attribute is the class associated with the object, which is a time series, but may be univariate or multivariate. Finally, the names attribute reflects individual name features of the series.

Take a look at how we may encode several different time series using R:

Advertisements

The Take Away

Within this article, we investigated the most prominent means of creating various object types in R, as well as the important attributes associated with these different objects. Furthermore, we provided the underlying theories for the different objects in R and how these differences manifest between R and Python. As it has been previously stated, R is an essential facet of data science, and understanding its intricacies is one of the best things you can do for improving your chances of obtaining a high ranking career in data science.

Advertisements

Throughout the course of this series, we will extrapolate on the most important nuances of R and bring you up to date with some of its fascinating features. As we pursue further knowledge of the data science world, we frequently consult the following source, which is a comprehensive tool for data science in an academic format.

Advertisements

It can be pricey in some regards, but I have managed to deliver it at highly discounted prices. I am not exaggerating when I say this could be one of the most impactful resources you consult as you continue to progress the ladder of R, Python and data science. It is loaded with tutorials, examples, and deep discussion of the theory underlying the various features of data science with respect to R and Python. I highly recommend you check it out here and by following the link obtain its tools at a highly discounted price to you.

Advertisements

Leave a Reply

%d bloggers like this: