# Gradient Descent and its Generalizations¶

## Learning Goal¶

The goal of this notebook is to gain intuition for various gradient descent methods by visualizing and applying these methods to some simple two-dimensional surfaces. Methods studied include ordinary gradient descent, gradient descent with momentum, NAG, RMSProp, and ADAM. This notebook follows Notebook 2 and Section IV from the ML Review by Mehta et al.

## Overview¶

In this notebook, we will visualize what different gradient descent methods are doing using some simple surfaces. From the onset, we emphasize that doing gradient descent on the surfaces is different from performing gradient descent on a loss function in Machine Learning (ML). The reason is that in ML not only do we want to find good minima, we want to find good minima that generalize well to new data. Despite this crucial difference, we can still build intuition about gradient descent methods by applying them to simple surfaces (for a useful blog post, see here).

## Surfaces¶

We will consider three simple surfaces:

• a quadratic minimum of the form

$$z(x,y)=ax^2+by^2,$$

• a saddle-point of the form

$$z(x,y)=ax^2-by^2,$$

• and Beale's Function:

$$z(x,y) = (1.5-x+xy)^2+(2.25-x+xy^2)^2+(2.625-x+xy^3)^2.$$

$$z(x,y) = (1-x)^2 + 100(y-x^2)^2,$$
$$z(x,y) = (x^2+y-11)^2 + (x+y^2-7)^2,$$

The last three are non-convex functions often used to test optimization problems. These surfaces can be plotted using the cells below.

In [1]:
#This cell sets up basic plotting functions we will use to visualize the gradient descent routines.

#Make plots interactive
%matplotlib notebook

#Make plots static
#%matplotlib inline

#Make 3D plots
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from matplotlib import cm
from IPython.display import HTML
from matplotlib.colors import LogNorm

#Import Numpy
import numpy as np

#Define function for plotting

def plot_surface(x, y, z, azim=-60, elev=40, dist=10, cmap="jet"):

fig = plt.figure()
plot_args = {'rstride': 1, 'cstride': 1, 'cmap':cmap,
'linewidth': 20, 'antialiased': True,
'vmin': -2, 'vmax': 2}
ax.plot_surface(x, y, z, **plot_args)
ax.view_init(azim=azim, elev=elev)
ax.dist=dist
ax.set_xlim(-1, 1)
ax.set_ylim(-1, 1)
ax.set_zlim(-2, 2)

plt.xticks([-1, -0.5, 0, 0.5, 1], ["-1", "-1/2", "0", "1/2", "1"])
plt.yticks([-1, -0.5, 0, 0.5, 1], ["-1", "-1/2", "0", "1/2", "1"])
ax.set_zticks([-2, -1, 0, 1, 2])
ax.set_zticklabels(["-2", "-1", "0", "1", "2"])

ax.set_xlabel("x", fontsize=18)
ax.set_ylabel("y", fontsize=18)
ax.set_zlabel("z", fontsize=18)
return fig, ax;

def overlay_trajectory_quiver(ax,obj_func,trajectory, color='k'):

xs=trajectory[:,0]
ys=trajectory[:,1]
zs=obj_func(xs,ys)
ax.quiver(xs[:-1], ys[:-1], zs[:-1], xs[1:]-xs[:-1], ys[1:]-ys[:-1],zs[1:]-zs[:-1],color=color,arrow_length_ratio=0.3)

return ax;

def overlay_trajectory(ax,obj_func,trajectory,label,color='k'):
xs=trajectory[:,0]
ys=trajectory[:,1]
zs=obj_func(xs,ys)
ax.plot(xs,ys,zs, color, label=label)

return ax;

def overlay_trajectory_contour(ax,trajectory, label,color='k',lw=2, plot_marker=False):
xs=trajectory[:,0]
ys=trajectory[:,1]
ax.plot(xs,ys, color, label=label,lw=lw)
if plot_marker:
ax.plot(xs[-1],ys[-1], color+'>', markersize=10)
return ax;

In [2]:
#DEFINE SURFACES WE WILL WORK WITH

return x**3 - 3*x*y**2

x=params[0]
y=params[1]

return a*x**2-b*y**2

x=params[0]
y=params[1]

# Define minima_surface

def minima_surface(x,y,a=1,b=1):
return a*x**2+b*y**2-1

x=params[0]
y=params[1]

def beales_function(x,y):
return (1.5-x+x*y)**2 + (2.25-x+x*y**2)**2 + (2.625-x+x*y**3)**2

x=params[0]
y=params[1]

def contour_beales_function():
#plot beales function
x, y = np.meshgrid(np.arange(-4.5, 4.5, 0.1), np.arange(-4.5, 4.5, 0.1))
fig, ax = plt.subplots(figsize=(10, 6))
z=beales_function(x,y)
cax = ax.contour(x, y, z, levels=np.logspace(0, 5, 35), norm=LogNorm(), cmap="RdYlBu_r")
ax.plot(3,0.5, 'r*', markersize=18)

ax.set_xlabel('$x$')
ax.set_ylabel('$y$')

ax.set_xlim((-4.5, 4.5))
ax.set_ylim((-4.5, 4.5))

return fig,ax

#Make plots of surfaces
plt.close() # closes previous plots
x, y = np.mgrid[-1:1:31j, -1:1:31j]

In [3]:
fig1,ax1=plot_surface(x,y,monkey_saddle(x,y))
plt.show()

In [4]:
fig2,ax2=plot_surface(x,y,saddle_surface(x,y))
plt.show()