In this homework you will:
Recall that logistic regression learns a weight vector such that $wx >> 0$ for positive instances and $wx << 0$ for negative instances. Below you'll look at the weights that were learned and think about which features are important.
Remeber that the implementation of multi-class logistic regression in scikit is one-vs-all.
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_wine
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import accuracy_score
The wine dataset has 13 features that are real valued and all positive. That last bit is important for what follows. The goal is to classify a sample of wine characterized by its 13 featues into one of three types of wines.
data = load_wine()
X = data['data']
y = data['target']
The plot below shows the weights associated with all 13 features for each of the three classes. The are overlaid so that you can compare weights across classes.
clf = LogisticRegression(C = 1)
clf.fit(X, y)
x = list(range(13))
plt.plot(x, clf.coef_[0], label=data['target_names'][0])
plt.plot(x, clf.coef_[1], label=data['target_names'][1])
plt.plot(x, clf.coef_[2], label=data['target_names'][2])
plt.legend()
plt.xticks(x, data['feature_names'], rotation ='vertical')
plt.show()
/Users/oates/tmp/env/anaconda3/lib/python3.8/site-packages/sklearn/linear_model/_logistic.py:763: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result(
Given the plot above, give a brief answer (a few sentences to a paragraph) to each of the following questions.