Scikit Learn Notes
sklearn Legend¶
| import | task |
|---|---|
sklearn.model_selection |
Helps in selecting between models. Also, splitting train-test |
sklearn.impute |
Missing Values |
sklearn.preprocessing |
Change the form of data. Ordinal Categorical \(\to\) Numbers. Make it machine friendly |
sklearn.base |
Access base classes to create custom estimator, transformer |
sklearn.pipeline |
Create pipelines |
sklearn.compose |
Access to column transformers (take care of categorical and numerical columns in the same transformation) |
sklearn.metrics |
Score functions; Performance metrics; Distance computations |
sklearn.tree |
Decision tree based models |
sklearn.ensemble |
Various ensemble methods (e.g. RandomForestRegressor) |
sklearn.svm |
Dedicated for Support Vector Machines??? |
sklearn.multiclass |
Access to OneVsRestClassifier and OneVsOneClassifier to override default multiclass behavior in Binary classifiers |
sklearn.neighbors |
Distance based models |
Design Philosophy¶
Consistency¶
All objects share a consistent and simple interface
| Object | Explanation | Example |
|---|---|---|
| Estimators | - Any object that can estimate some parameters based on the estimator - The estimation is done by the .fit() method.- Takes only a dataset as a parameter (or 2 (data and labels) for supervised learning algorithms) - Any other parameter is considered as a hyperparameter (e.g. strategy) |
Imputer |
| Transformers | - Any object that can transform the dataset. - The transformation is performed by the .transform() method, which takes input as a dataset to be transformed- Both .fit() and .transform() can be conveniently be called together (with possible optimization) via the .fit_transform() function |
Imputer |
| Predictors | - Capable of making predictions. - The .predict() method takes a dataset of new instances and returns a dataset of corresponding predictions.- A predictor also has a .score() method that measures the quality of predictions, given a test set (w/ labels, in case of supervised learning). |
Having learnt the above, now try implementing an [[Imputing Data using Scikit-Learn|imputing strategy in Scikit-Learn]].
Inspection¶
- Estimator's hyperparameters are accessible directly via public instance variables, e.g.
imputer.strategy - Estimator's learned parameters are accessible in a similar fashion but with an
_(underscore) suffix, e.g.imputer.statistics_
Nonproliferation of Classes¶
- All outputs given in the form of
numpyarrays orscipysparse matrices. - Hyperparameters are regular python
stringornumbers
Composition¶
Existing building blocks
1. Can be reused
2. Can be combined to create a Pipeline (arbitrary sequence of transformations followed by a final estimator)
Sensible defaults¶
- Most common defaults
- Baseline system is quick to create
Models¶
Classifiers¶
| Model | |||
|---|---|---|---|
sgd |
Stochastic Gradient Descent | uses .decision_function() |
Multiple |
forest |
Random Forest Classifier | uses .predict_proba() |
Multiple |
| Naive Bayes | Multiple | ||
svc |
Support Vector Machine Classifier | uses .decision_function() |
Binary w/ OvO strategy |