Classification¶

MNIST¶

In [ ]:

Copied!

from sklearn.datasets import fetch_openml

mnist = fetch_openml('mnist_784', version=1, cache=True)
mnist.keys()
from sklearn.datasets import fetch_openml

mnist = fetch_openml('mnist_784', version=1, cache=True)
mnist.keys()

Out[ ]:

dict_keys(['data', 'target', 'frame', 'categories', 'feature_names', 'target_names', 'DESCR', 'details', 'url'])

In [ ]:

Copied!

X, y  = mnist.data, mnist.target
X, y  = mnist.data, mnist.target

In [ ]:

Copied!





import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

INDEX = 0

some_digit = X.iloc[INDEX].values
some_digit_image = some_digit.reshape(28,28)

plt.imshow(some_digit_image,cmap="binary")
plt.axis('off')
plt.show()
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

INDEX = 0

some_digit = X.iloc[INDEX].values
some_digit_image = some_digit.reshape(28,28)

plt.imshow(some_digit_image,cmap="binary")
plt.axis('off')
plt.show()

No description has been provided for this image

In [ ]:

Copied!

y = y.astype(np.int8)
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]
y = y.astype(np.int8)
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

Binary Classifier¶

In [ ]:

Copied!

y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)
print(y_train_5[:5])
y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)
print(y_train_5[:5])

0     True
1    False
2    False
3    False
4    False
Name: class, dtype: bool

In [ ]:

Copied!

from sklearn.linear_model import SGDClassifier
sgd_clf = SGDClassifier(random_state=42)
sgd_clf.fit(X_train.to_numpy(), y_train_5)
from sklearn.linear_model import SGDClassifier
sgd_clf = SGDClassifier(random_state=42)
sgd_clf.fit(X_train.to_numpy(), y_train_5)

Out[ ]:

SGDClassifier(random_state=42)

In [ ]:

Copied!

sgd_clf.predict([some_digit])
sgd_clf.predict([some_digit])

Out[ ]:

array([ True])

Performance Measures¶

Out of all positives, how many should I believe? $$ \text{Precision} = TPR = \dfrac{TP}{TP + FN} $$
Out of all real positives, how many were correctly identified? $$ \text{Recall/Sensitivity} = TPR = \dfrac{TP}{TP + FN} = \dfrac{TP}{\text{Total Actual Positives}} $$
Out of all negatives, how many should I disbelief? $$ FPR = \dfrac{FP}{TN + FP} = \dfrac{FP}{\text{Total Actual Negatives}} $$
Out of all real negatives, how many were correctly identified? $$ \text{Specificity} = TNR = \dfrac{TN}{TN + FP} = \dfrac{TN}{\text{Total Actual Negatives}} $$

$$ F_{1} = \dfrac{2}{\frac{1}{\text{Precision}}+\frac{1}{\text{Recall}}} $$

Harmonic mean gives more weight to low values. So, a high $F_1$ is given only when both precision and recall are high.
Precision: Out of all your postive predictions, how many were correct?
Recall: Out of all the positive predicitons, how many could you recall?

Multiclass Classification¶

Error Analysis¶

Multilabel Classification¶

Identify multiple people (label) in the same image [Alice, Bob, Charlie] Output: [1,0,1]. Multiple binary tags.
Alice, bob and charlie are labels. There are two classes (0 or 1)

Multioutput Classification¶

multioutput-multiclass classification is a generalization of the multilabel classification where each label can be multiclass.
In an image:
- 1 label per pixel
- Each label is a multiclass (0 to 255 values of pixel intensity)