Homework2

In this homework you will implement a numerical approach to gradient descent and use it to implement the perceptron algorithm. That will take place in stages described below. But we'll start by describing how to perform gradient descent numerically.

The method of finite differences¶

Given a function $f(x)$, the analytical approach to gradient descent - i.e., to finding the value of $x$ that minimizes $f(x)$ - is to compute $f'(x)$ and iterate as follows, where $\alpha \in (0, 1]$ is the learning rate:

x = 0
while not converged:
- $x = x - \alpha f'(x)$

If you cannot compute $f'(x)$ analytically, you can estimate it as follows, for sufficiently small $\epsilon$:

$f'(x) \approx \frac{f(x + \epsilon) - f(x - \epsilon)}{2\epsilon}$

The method of finite differences makes the assumption that over very small intervals the function behaves linearly and the slope of the tangent line can be estimated as the difference between function values at the ends of the interval divided by the interval width.

For example, the slope of $f(x) = x^2$ at $x = 1$ can be estimated as $f'(1) = \frac{(1 + 0.001)^2 - (1 - 0.001)^2}{2 * 0.001}$.

In [3]:

e = 0.001
(math.pow(1 + e, 2) - math.pow(1 - e, 2))/(2*e)

Out[3]:

1.9999999999998352

Note that the number above is very close to what you would get analytically by taking derivatives: $f'(x) = 2x$ so $f'(1) = 2*1 = 2$.

You'll use the method of finite differences to compute the derivative of a loss function with respect to the weights of a perceptron.

Below is a simple implementation of the method of finite differences for a univariate function. It is overly simple, running for a fixed number of iterations, and assuming constants for $\epsilon$ and $\alpha$, but it works for simple cases.

In [6]:

# x*x is minimized at 0
f = lambda x: x * x
fd_demo(f, 2)

Out[6]:

3.365934714445534e-09

In [5]:

# sin(x) is minimized at lots of points, the closest to 2 is 3pi/2
f = lambda x: math.sin(x)
print(fd_demo(f, 2))
print(math.pi * 3 / 2)

4.711986618234112
4.71238898038469

Perceptron class¶

Below is a simple Perceptron class that you can use in this assignment. Feel free to use it as is, make changes, or throw it out and write your own. It creates a weight vector with small random numbers when initialized, and has methods to compute the activation for an input and to produce a binary class label for an input.

Note that the class below does not maintain an explicit bias term $b$. You can add one or, better yet, make sure that all inputs, $x$, have a 1 in one of the positions.

In [9]:

class Perceptron:
    
    def __init__(self, n):
        """
        n (int) - The length of x vectors that the classifier will process.
        """
        
        # Start with weights uniformly chosen from the range [-0.5, 0.5]
        self.weights = (np.random.rand(n) - 0.5).tolist()
        
    def activation(self, x):
        return np.dot(x, self.weights)
    
    def predict(self, x):
        return 1 if self.activation(x) > 0 else -1
        

Task 1: The hinge loss¶

Fill in the function below to compute the hinge loss. Recall that the hinge loss is 0 if the activation and the class label have the same sign, otherwise it is -y * activation.

Task 2: One step of gradient descent¶

Fill in the function below which takes the following arguments:

clf - An instance of the perceptron class above
x - A training instance
y - The corresponding true class label
learning_rate - A learning rate in the range (0, 1]
loss_fn - A function that takes as input the true class label and the activation and returns a real number which is the loss on that example
epsilon - The delta to use when using the method of finite differences

The function must return a new set of weights to use in the perceptron after performing one step of gradient descent update using the training example and loss function. To do that it will:

Loop over each of the weights
Compute the partial derivative of the loss with respect to the weight using the method of finite differences
Use the computed gradient (the list of partials with respect to each of the weights) to compute a new weight vector as $w = w - \alpha g$ where $w$ is the weight vector, $\alpha$ is the learning rate, and $g$ is the computed gradient.
Return the new weight vector

Note: Be careful not to modify the weights of the preceptron in place in the routine below.

Some training data¶

The code below generates a simple 2D dataset of n positive examples followed by n negative examples. The cell after that plots them. The code also prepends a 1 in each example so that the bias term will simply correspond to the first weight.

In [8]:

n = 10
X = np.concatenate((np.random.rand(n, 2) + 1,
                    np.random.rand(n, 2)))
X = np.hstack((np.expand_dims(np.ones(2*n), 1), X))
Y = [1] * n + [-1] * n
colors = c = ['r'] * n + ['g'] * n

# Randomize the order of the instances just for fun
rng = np.random.default_rng()
state = rng.__getstate__()
rng.shuffle(X)
rng.__setstate__(state)
rng.shuffle(Y)
rng.__setstate__(state)
rng.shuffle(colors)

In [9]:

plt.scatter(X[:,1], X[:,2], c = colors)

Out[9]:

<matplotlib.collections.PathCollection at 0x7fa4e83e4790>

Task 3: Full gradient descent¶

If you've done everthing above correctly, the code below will perform gradient descent to train the classifier.

Modify this code so that it runs until one epoch produces no classification errors rather than running for a fixed number of iterations.

Task 4: Plot some hyperplanes¶

Run full gradient descent 5 times and write a routine to convert the weights into slope/intercept form. Then use the function below to plot the hyperplanes learned by the perceptron along with the data in one graph. The second cell below shows how that can be done. Write a paragraph explaining what you see in the plot, touching on how much variation there is from run to run and whether the separators seem like "good" ones.

In [10]:

def abline(slope, intercept):
    """Plot a line from slope and intercept"""
    axes = plt.gca()
    x_vals = np.array(axes.get_xlim())
    y_vals = intercept + slope * x_vals
    plt.plot(x_vals, y_vals, '--')

Task 5: Do task 4 again with a new loss¶

Repeat The previous task, generating 5 plots of the separators correspond to hyperplanes, but this time use the loss function below.

How are the results different? Write a brief paragraph explaining why the results look different.

Homework 2¶