import math
import numpy as np
import matplotlib.pyplot as plt
In this homework you will implement a numerical approach to gradient descent and use it to implement the perceptron algorithm. That will take place in stages described below. But we'll start by describing how to perform gradient descent numerically.
Given a function $f(x)$, the analytical approach to gradient descent - i.e., to finding the value of $x$ that minimizes $f(x)$ - is to compute $f'(x)$ and iterate as follows, where $\alpha \in (0, 1]$ is the learning rate:
If you cannot compute $f'(x)$ analytically, you can estimate it as follows, for sufficiently small $\epsilon$:
$f'(x) \approx \frac{f(x + \epsilon) - f(x - \epsilon)}{2\epsilon}$
The method of finite differences makes the assumption that over very small intervals the function behaves linearly and the slope of the tangent line can be estimated as the difference between function values at the ends of the interval divided by the interval width.
For example, the slope of $f(x) = x^2$ at $x = 1$ can be estimated as $f'(1) = \frac{(1 + 0.001)^2 - (1 - 0.001)^2}{2 * 0.001}$.
e = 0.001
(math.pow(1 + e, 2) - math.pow(1 - e, 2))/(2*e)
1.9999999999998352
Note that the number above is very close to what you would get analytically by taking derivatives: $f'(x) = 2x$ so $f'(1) = 2*1 = 2$.
You'll use the method of finite differences to compute the derivative of a loss function with respect to the weights of a perceptron.
Below is a simple implementation of the method of finite differences for a univariate function. It is overly simple, running for a fixed number of iterations, and assuming constants for $\epsilon$ and $\alpha$, but it works for simple cases.
def fd_demo(f, x0):
e = 0.001
a = 0.01
x = x0
for _ in range(1000):
g = (f(x + e) - f(x - e)) / (2*e)
x = x - a * g
return x
# x*x is minimized at 0
f = lambda x: x * x
fd_demo(f, 2)
3.365934714445534e-09
# sin(x) is minimized at lots of points, the closest to 2 is 3pi/2
f = lambda x: math.sin(x)
print(fd_demo(f, 2))
print(math.pi * 3 / 2)
4.711986618234112 4.71238898038469
Below is a simple Perceptron class that you can use in this assignment. Feel free to use it as is, make changes, or throw it out and write your own. It creates a weight vector with small random numbers when initialized, and has methods to compute the activation for an input and to produce a binary class label for an input.
Note that the class below does not maintain an explicit bias term $b$. You can add one or, better yet, make sure that all inputs, $x$, have a 1 in one of the positions.
class Perceptron:
def __init__(self, n):
"""
n (int) - The length of x vectors that the classifier will process.
"""
# Start with weights uniformly chosen from the range [-0.5, 0.5]
self.weights = (np.random.rand(n) - 0.5).tolist()
def activation(self, x):
return np.dot(x, self.weights)
def predict(self, x):
return 1 if self.activation(x) > 0 else -1
Fill in the function below to compute the hinge loss. Recall that the hinge loss is 0 if the activation and the class label have the same sign, otherwise it is -y * activation.
def loss_hinge(y, activation):
pass
Fill in the function below which takes the following arguments:
The function must return a new set of weights to use in the perceptron after performing one step of gradient descent update using the training example and loss function. To do that it will:
Note: Be careful not to modify the weights of the preceptron in place in the routine below.
def gd_step(clf, x, y, learning_rate, loss_fn, epsilon = 0.001):
pass
The code below generates a simple 2D dataset of n positive examples followed by n negative examples. The cell after that plots them. The code also prepends a 1 in each example so that the bias term will simply correspond to the first weight.
n = 10
X = np.concatenate((np.random.rand(n, 2) + 1,
np.random.rand(n, 2)))
X = np.hstack((np.expand_dims(np.ones(2*n), 1), X))
Y = [1] * n + [-1] * n
colors = c = ['r'] * n + ['g'] * n
# Randomize the order of the instances just for fun
rng = np.random.default_rng()
state = rng.__getstate__()
rng.shuffle(X)
rng.__setstate__(state)
rng.shuffle(Y)
rng.__setstate__(state)
rng.shuffle(colors)
plt.scatter(X[:,1], X[:,2], c = colors)
<matplotlib.collections.PathCollection at 0x7fa4e83e4790>
If you've done everthing above correctly, the code below will perform gradient descent to train the classifier.
Modify this code so that it runs until one epoch produces no classification errors rather than running for a fixed number of iterations.
clf = Perceptron(3)
for epoch in range(100):
for x, y in zip(X, Y):
clf.weights = gd_step(clf, x, y, 0.01, loss_hinge)
print(clf.weights)
Run full gradient descent 5 times and write a routine to convert the weights into slope/intercept form. Then use the function below to plot the hyperplanes learned by the perceptron along with the data in one graph. The second cell below shows how that can be done. Write a paragraph explaining what you see in the plot, touching on how much variation there is from run to run and whether the separators seem like "good" ones.
def weights_to_slope_intercept(weights):
pass
def abline(slope, intercept):
"""Plot a line from slope and intercept"""
axes = plt.gca()
x_vals = np.array(axes.get_xlim())
y_vals = intercept + slope * x_vals
plt.plot(x_vals, y_vals, '--')
plt.xlim([0, 2])
plt.ylim([0, 2])
plt.scatter(X[:,1], X[:,2], c = colors)
abline(-1, 2)
abline(2, 0)
Repeat The previous task, generating 5 plots of the separators correspond to hyperplanes, but this time use the loss function below.
How are the results different? Write a brief paragraph explaining why the results look different.
def loss_hinge_margin(y, activation):
return 0 if y * activation > 1 else -y * activation + 1