What are Pandas Ufuncs?
Pandas Ufuncs is a label that is short for universal functions. The use of these computations stems from their embedded nature within the NumPy library, and allow a programmer to execute vectorized functions which greatly enhance the efficiency of computation on large data arrays. Before getting in to the nuts and bolts of Pandas data structures computations, it is essential that one have a robust understanding of the data structures themselves. This article provides a deep introduction into the Pandas data structures which will prove very helpful for utilizing Pandas in a variety of settings. Furthermore, an additional article demonstrates the importance of using Pandas data structures in designing a web scraper of your design, found here. Check these out for extremely helpful insights.
If you have never had previous experience with NumPy functions yet, have no concern. A future article will deal with implementation of the library and its various computations. Nevertheless, the Pandas system was designed to work in tandem with the NumPy library. Thus, Ufuncs operations are also applicable to Pandas data structures. Herein, we explore the use of Pandas Ufuncs in executing computations on data.
How do Ufuncs Work?
As previously stated, Ufuncs execute vectorized operations on sets of data within some data structure. This means that Ufuncs apply an operation to every element in a data structure simultaneously. In providing a tool that operates in this fashion, the need for loops to execute repetitive operations is eliminated. This design leads to significantly faster execution times, and also makes coding much easier if you know the syntax.
Categories of Pandas Ufuncs
Pandas Ufuncs possess several different categories capable of executing a variety of functions:
- Arithmetic Operations: Ufuncs support basic mathematical functions which it applies rapidly to all elements in a data structure. It may be addition, subtraction, multiplication,division, negation, exponentiation, and a myriad of others
- Trigonometric Operations: Providing data in an appropriate form is present in the data structure, can apply sine, cosine, and tangent functions. Their inverse relatives are also available.
- Logarithmic Operations: The logarithmic Ufuncs transform all elements of the Pandas data structures, well, logarithmically.
- Statistical Operations: Pandas provide specialized statistical functions such as computing gamma functions of general factorials as well as binary functions
- Aggregate Operations: These Ufuncs reduce the data in the Pandas data structure. The reduce function works in conjunction with other operations to reduce the data in an array (or the respective row/column) to a singular value
All of these mechanisms will be examined individually along with examples to provide a detailed description of how these Ufuncs work and provide insight to their outputs.
Pandas DataStructures To Be Manipulated
First, we create several different Pandas data structures, to demonstrate how these operations work on different entities. These example structures include a Pandas series without a defined index and with data randomly generated, a Pandas series with a defined index and manually input data, a Pandas DataFrame with one data column and a Pandas DataFrame with two data columns. Let’s first observe the code for these and the structures they produce.
Now, let us observe the structures this code defines:
Arithmetic Operations With Pandas Ufuncs
The traditional arithmetic operations (addition, multiplication, subtraction, division, and powers) are all available as Numpy Ufuncs. These Ufuncs permit vectorized arithmetic to large data sets for rapid computations. We employ these Ufuncs individually to each data structure to observe how they are enacted and the different results they produce.
Ufuncs Arithmetic: Addition
The Ufuncs mechanism for addition follows the form “np.add(data structure, value to add).” We apply it to the data structures we created with the following syntax:
In applying this code, a value of five is added to each value in the data array, which produces:
Note that in the data frame with two columns of data, a value of five is added to both columns, not just the ‘numbers’ column.
Ufuncs Arithmetic: Subtraction
The Ufuncs mechanism for subtraction follows the form “np.subtract(data structure, value to add).” We apply it to the data structures we created with the following syntax:
We observe here that the syntax for vectorized subtraction is quite similar to the Ufunc for addition. Therefore, the only difference is that rather than adding a value of five to each value, this number is subtracted from each element. It produces the following data structures:
Ufuncs Arithmetic: Multiplication
The Ufuncs mechanism for multiplication follows the form “np.multiplication(data structure, value to multiply).” We apply it to the data structures we created with the following syntax:
Again, this form follows that of previous arithmetic operations, providing the set to which the operation is to be applied to, as well as the scalar by which to multiply the elements. Performing this Ufunc on our data structures produces the following results.
As you can see, every value in each of the data structures was multiplied by a value of ‘2’.
Ufuncs Arithmetic: Division
The Ufuncs mechanism for division follows the form “np.divide(data structure, value to divide by).” Note that dividing by zero will produce an error due to the invalidity of division by zero. We apply it to the data structures we created with the following syntax:
Executing this code produces data structures of the following form:
Note that in executing the Numpy Ufunc for division, a float number is returned.
Trigonometric Operations With Pandas Ufuncs
When working with data, especially complex data, sometimes trigonometric transformations may be necessary. Pandas Ufuncs permit these operations with the trigonometric functions, which respond accordingly. We can code these operations as follows.
In this code, we observe that the trigonometric functions of sine, cosine, and tangent are applied to each and every data structure. Furthermore, this operation is applied to each element thereof. I will show the output for the sine operation, as they are all relatively similar:
Note that Ufuncs are also available for the inverse functions with the syntax np.arcsin, np.arccos, and np.arctan. However, be careful when using these, as they tend only to work on data in decimals.
Exponential and Logarithmic Operations With Pandas Ufuncs
Ufuncs for Exponentiation
With Ufuncs, exponentiation can be executed in several ways. Using np.exp raises every item in the structure e^x. np.exp2 squares every value. Then we use the np.power function for operations above the order of two. It looks a bit like this:
These operations procure the following data structures:
Take note of the last data structure where the value for each element in the column specifying ‘Party” is all zero. This is because when working with multidimensional data, we need to specify multiply values by which to raise the data by.
Ufuncs for Logarithms
As with Ufuncs exponentiation, logarithmic transformations are also a frequent operation we must perform on data sets. This becomes especially apparent in attempts to normalize data. There are also several ways for logarithmically transforming data with Pandas Ufuncs. Using the operation np.log actually takes the natural logarithm of the data. np.log2 takes the base 2 logarithm of the data while np.log10 takes the base 10 logarithm of the data. Let us see the code for these functions:
Let’s observe the data structures produced by these functions:
Note that in the first data structure, some of the values return an error ‘inf’. This is a consequence of taking the natural logarithm of zero.
Statistical Operations With Pandas Ufuncs
Perhaps the reason why we happen to be working with Pandas data structures is due to the fact that we are executing statistical modifications to our data. Now, there are a variety of Ufuncs available for this purpose, but these will be discussed exclusively in another article. However, I provide here three of which I consider to be most useful in normalizing data. To do so require the importing of the ‘scipy’ library. This is done to execute the gamma, gammaln, and beta operations. The syntax for these functions appears as follows:
Now, I won’t get into the specifics of these functions and what they do, that is beyond the scope of this article. However, I will show you the results procured so you can observe the type of data produced. Here it is:
The Take Away
Perhaps you can see the utility of executing these operations, as well as the ease with which they can be implemented. I won’t drown you with the data, but let me assure you that these operations work significant orders of magnitude more efficiently then executing the cognate operations with loops. If you have found this article useful, you might wish to consider checking out a previous article written which provides greater depth in understanding the Pandas data structures, and can check out an additional article which demonstrates how these can be applied in generating a web scraper of your design. Additionally, check out this excerpt providing great knowledge towards the utilization of Ufuncs.