Classification¶
MNIST¶
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1, cache=True)
mnist.keys()
dict_keys(['data', 'target', 'frame', 'categories', 'feature_names', 'target_names', 'DESCR', 'details', 'url'])
X, y = mnist.data, mnist.target
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
INDEX = 0
some_digit = X.iloc[INDEX].values
some_digit_image = some_digit.reshape(28,28)
plt.imshow(some_digit_image,cmap="binary")
plt.axis('off')
plt.show()
y = y.astype(np.int8)
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]
Binary Classifier¶
y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)
print(y_train_5[:5])
0 True 1 False 2 False 3 False 4 False Name: class, dtype: bool
from sklearn.linear_model import SGDClassifier
sgd_clf = SGDClassifier(random_state=42)
sgd_clf.fit(X_train.to_numpy(), y_train_5)
SGDClassifier(random_state=42)
sgd_clf.predict([some_digit])
array([ True])
Performance Measures¶
Out of all positives, how many should I believe? $$ \text{Precision} = TPR = \dfrac{TP}{TP + FN} $$
Out of all real positives, how many were correctly identified? $$ \text{Recall/Sensitivity} = TPR = \dfrac{TP}{TP + FN} = \dfrac{TP}{\text{Total Actual Positives}} $$
Out of all negatives, how many should I disbelief? $$ FPR = \dfrac{FP}{TN + FP} = \dfrac{FP}{\text{Total Actual Negatives}} $$
Out of all real negatives, how many were correctly identified? $$ \text{Specificity} = TNR = \dfrac{TN}{TN + FP} = \dfrac{TN}{\text{Total Actual Negatives}} $$
$$ F_{1} = \dfrac{2}{\frac{1}{\text{Precision}}+\frac{1}{\text{Recall}}} $$
Harmonic mean gives more weight to low values. So, a high $F_1$ is given only when both precision and recall are high.
Precision: Out of all your postive predictions, how many were correct?
Recall: Out of all the positive predicitons, how many could you recall?
Multiclass Classification¶
Error Analysis¶
Multilabel Classification¶
- Identify multiple people (label) in the same image [Alice, Bob, Charlie] Output: [1,0,1]. Multiple binary tags.
- Alice, bob and charlie are labels. There are two classes (0 or 1)
Multioutput Classification¶
- multioutput-multiclass classification is a generalization of the multilabel classification where each label can be multiclass.
- In an image:
- 1 label per pixel
- Each label is a multiclass (0 to 255 values of pixel intensity)