How to Use Directional Derivatives For Machine Learning: Topics of Partial Derivatives


By this point in our series on partial derivates, we not only have expanded upon the idea of multivariate equations, but also introduced the taking of partial derivatives and their associated tangent planes/linear approximations. If you have not referenced these items, we highly recommend it, for as we proceed in our application of these concepts, understanding of the fundamentals is critical. Nevertheless, in this article, we expand upon the application of directional derivatives. If you’d like to follow along with the primary resource for this article, you can find James Stewart’s Multivariable Calculus here.


Conceptualizing Directional Derivatives

From our previous study of multi-variable equations, and the partial derivatives thereof, we defined partial derivatives according to their limits as follows:

f_x(x_0,y_0)=\lim\limits_{h \to 0}\frac{f(x_0+h,y_0)-f(x_0,y_0)}{h}
f_y(x_0,y_0)=\lim\limits_{h \to 0}\frac{f(x_0,y_0+h)-f(x_0,y_0)}{h}

Using the partial derivatives of the function with respect to either ‘x’ or ‘y’, we are readily able to discern the rate of change for ‘z’ in either of these dimensions. Because the partial derivative separates these dimensions computationally, these rates of change can also be modeled using vectors, particularly ‘i’ and ‘j’. In this manner, we can consider the rate of change at a particular point using vectors.


Computing Directional Derivatives

Our leading source with regards to the use of directional derivatives (which is now available for as low as $30), has a lot to say on this matter. Consider an instance wherein we desire to compute the rate of change of the ‘z’ dimension at a particular point. We can do so using a unit vector of the form u = <a, b>. Look back to our original multi-variable function boasting the form z=f(x,y). Such a function defines a surface we call ‘S’. In this case, if z0=f(x0,y0), then the point (x0,y0,z0) lies on the surface ‘S’ defined by z=f(x,y).

Now, suppose that a plane intersects the entirety of the surface in the direction of the vector ‘u’. Then a line ‘C’ could represent the intersection between the plane and the surface. The slope of the tangent line to C then reflects the rate of change of ‘z’ in the direction of ‘u’. By applying this, we can now compute rate of change not only in the direction ox ‘x’ or ‘y’, but in any intermediate dimension as well.


This is precisely how we characterize directional derivatives. We properly define it as follows: The directional derivative of a multivariate function of the form z=f(x,y) at a particular point in the direction of vector ‘u’ mat be defined as:

D_uf(x_0,y_0)=\lim\limits_{h \to 0}\frac{f(x_0+ha,y_0+hb)-f(x_0,y_0)}{h}

Furthermore, you can observe in the image below how the directional derivative takes form:


Defining Directional Derivatives with Partial Derivatives

We can take the understanding of directional derivatives from above and user our knowledge of partial derivatives to define and prove them. Now, suppose that our function ‘f’ is differentiable at ‘x’ and ‘y’ and the function has a directional derivative in the direction of vector ‘u’. If the vector ‘u’ is defined as u=<a,b>, then we can define the directional derivative with partial derivatives as:


We can prove this using our comprehensive understanding of the derivative. Let’s presume the existence of a single variable function ‘g’ with the variable ‘h’, which would appear as follows:


Taking the derivative of the function returns:

g'(0)=\lim\limits_{h \to 0}\frac{g(h)-g(0)}{h}\newline g'(0)=\lim\limits_{h \to 0}\frac{f(x_0+ha,y_0+hb)-f(x_0,y_0)}{h}\newline g'(0)=D_uf(x_0,y_0)

Now, suppose that our unit vector ‘u’ makes an angle θ with the axis, then we can define our vectors in terms of sine and cosine such that u=<cosθ,sinθ>. Thus, we can rearrange our understanding of the directional derivative to form:


Gradient Vectors

Our further comprehension of directional derivatives in terms of gradient vectors relies heavily on our understanding of vectors and their respective computations. Thus, if you need a refresher on this subject matter before embarking further, check out our previous series on vector calculus posted below this subsection.

We previously defined directional derivatives with respect to partial derivatives. However, we can also compute these functions using dot products of vectors. Let us observe how this can be executed:

D_uf(x,y)=f_x(x,y)a+f_y(x,y)b\newline =< f_x(x,y),f_y(x,y)> \cdot < a,b>\newline =< f_x(x,y),f_y(x,y)> \cdot u

From the first vector in the above function, we may derive the gradient vector of the function. The gradient appears as follows:

∇f(x,y)=< f_x(x,y),f_y(x,y)>=\frac{\partial f}{\partial x}i+\frac{\partial f}{\partial y}j

Computing Maxima with Directional Derivatives

Now that we have extensively defined directional derivatives and the variety of manners which they may be computed, we can now consider the derivative of the function in all directions. A common use of this ability is identifying the location where the rate of change for ‘z’ is the greatest and in what dimension. We can utilize the following theorem which allows us to compute the greatest rate of change for ‘z’ in any dimension:

\text{Suppose we have a multivariate function called ‘f’;}\newline \text{ then the maxima of the directional derivative we previously defined as } \newline D_uf(x) \enspace\text{is}\enspace |∇f(x)| \text{which occurs when the vector ‘u’ has the same} \newline \text{ direction as the gradient vector ∇f(x)}

Application of Maxima and the Gradient Vector

The gradient vector is a useful tool in a variety of contexts. Consider the multivariate functions which we have been over the last few articles: the function z=f(x,y) which defines a point of the form (x,y,z). Now, our understanding of the gradient vector and computation of the maxima that the gradient vector ∇f(x,y,z) confers the direction of the greatest rate of change for the function ‘f’. On the other hand, that direction is also perpindicular to the level surface ‘S’ that passes through the point (x,y,z). For this reason, moving along the surface ‘S’ in the same direction as the gradient vector has no change on ‘f’.

For a function of two variables, the gradient vector of the function takes the form ∇f(x,y). As with the multivariate function, this also delivers the direction of the greatest rate of change.


If we use a level curve map represented by f(x,y), the value for f(x,y) represents the height above sea level. The route for the steepest ascent can then be computed by the gradient vector. This may be demonstrated as:

This is just one of the myriad of applications for the gradient vector.

The Take Away

Computations of multi-variate functions involving partial derivatives permit a much more holistic understanding of a function. Furthermore, we can identify unique aspects of the function by using these tools. Perhaps the most applicable and general are the tangent planes. However, utilization of the gradient vector also allows for whole function wide analysis that allows us to understand data in a more intimate fashion. If you desire to learn more about the applications of gradient vectors, this resource offers graduate level insights into partial derivatives. Furthermore, consider checking out the rest of our series on partial derivatives to get a more comprehensive view on the subject.


Modeling Functions of Multiple Variables

Limits and Continuity of Higher Variable Functions

A Guide to Executing Partial Derivatives

A Guide to Tangent Planes and Linear Approximations


Leave a Reply

%d bloggers like this: