Precision & Recall
Precision refers to “How many times were you right about a Positive?” This measures the accuracy of positive predictions.
Recall refers to “How many positives could you recall correctly?“. Also refers to sensitivity or True Positive Rate (TPR). Another way to think about recall is how well you are able to DETECT the positives? Are you able to detect all of them or are there a few misses?
Hey
Why do we need both precision and recall?
Think of it this way. If I correctly identified 10 positives,
. Say, I was very careful, so I didn’t make any false positives, (e.g. never deem the innocent guilty even if it means you will have to let go of some criminals), so . This means our precision remains
(careful to ensure there are no ) but our recall would reduce, since we are letting some criminals go ( ).
- Need high precision when its important not to have any False positives (mark as safe. Like ‘safe for children’ videos or ‘safe for eating’ or ‘safe from virus’ to allow someone into your house)
- Need high recall when its important to catch shoplifters, it will be more inconvenient but its more important to “GET ALL THE POSITIVES RIGHT”, even when there are more false positives.
score
This is the harmonic mean of the precision and recall and gives more value to lower values. So, we get a higher
Confusion Matrix
The confusion matrix gives a good idea about the accuracy of binary classifiers
python
from sklearn.metrics import confusion_matrixy_train_perfect_predictions = y_train_5confusion_matrix(y_train_5, y_train_perfect_predictions)
Each row represents actual class (Positive / Negative). Each column represents predicted class (+ / -)
array([[54579, 0], # non-fives (Negative)[ 0, 5421]], dtype=int64) # fives (Positive)
ROC Curve
Receiver Operating Characteristic is the plot between the
Sensitivity or Recall
vs1-Specificity or FPR
Ratio of Positive instances correctly classified (as positive)
Ratio of Negative instances, incorrectly classified (as positive)
Ratio of Negative instances, correctly classified (as negative)
How to decide between ROC and PR?
If positive class is rare or when we care more about false positives than false negatives (e.g. Is a claim fradulent? Positive = fraudulant. In this case we will choose PR because both FP and FN are costly, but the primary concern is the class imbalance. Positive class is rare and thus we choose PR curves.)
Receiver Operating Characteristic - Area under the Curve (ROC AUC) is a metric used to compare different classifiers.
- Purely random: AUC = 0.5
- Perfect Classifier: AUC = 1.0
sklearn
classifiers generally have one or the other
decision_function()
predict_proba()
Multivariate Classification
There are multiple types of classifiers:
- SGD
- Random Forest Classifier
- Naive Bayes
These have native support for multiple classes. But there are others that don’t like
- Support Vector Machine Classifiers (SVCs)
- Logistic Regressions
They are strictly binary in nature. However, we can use alternate strategies like OvR
(One vs Rest) and OvO
(One vs One) strategy in order to use these models for multivariate classification. Say, we have
OvR
fitsbinary models. Each model checking for one (positive) and the rest (negative). So, on each model we have to fit 100% of the data. OvO
fitsbinary models, i.e. pairs of each class against others. Thus each model we have to fit contains only 20% of the data. But the number of models to be fit would be huge (e.g. 10 classes will require 45 models to be fit). So, this can be utilized by models that don’t scale well with data. E.g. SVC
s.
Variations
- Multiclass can have multiple classes (e.g. 0,1,2,3…) for the singular label that is being predicted.
- Multilabel can have multiple labels (each having a binary class)
- Multioutput is a generalization of multilabel where there are multiple labels and each label can have multiple classes.
In essence, every model is just a specification of a multioutput model.
The author shows an insane example for multioutput where he trains a model to remove the noise from a noisy digit using the
knn
classifier.