sklearn
Legend
import | task |
---|---|
sklearn.model_selection | Helps in selecting between models. Also, splitting train-test |
sklearn.impute | Missing Values |
sklearn.preprocessing | Change the form of data. Ordinal Categorical |
sklearn.base | Access base classes to create custom estimator , transformer |
sklearn.pipeline | Create pipelines |
sklearn.compose | Access to column transformers (take care of categorical and numerical columns in the same transformation) |
sklearn.metrics | Score functions; Performance metrics; Distance computations |
sklearn.tree | Decision tree based models |
sklearn.ensemble | Various ensemble methods (e.g. RandomForestRegressor ) |
sklearn.svm | Dedicated for Support Vector Machines??? |
sklearn.multiclass | Access to OneVsRestClassifier and OneVsOneClassifier to override default multiclass behavior in Binary classifiers |
sklearn.neighbors | Distance based models |
Design Philosophy
Consistency
All objects share a consistent and simple interface
Object | Explanation | Example |
---|---|---|
Estimators | - Any object that can estimate some parameters based on the estimator - The estimation is done by the .fit() method.- Takes only a dataset as a parameter (or 2 (data and labels) for supervised learning algorithms) - Any other parameter is considered as a hyperparameter (e.g. strategy) | Imputer |
Transformers | - Any object that can transform the dataset. - The transformation is performed by the .transform() method, which takes input as a dataset to be transformed- Both .fit() and .transform() can be conveniently be called together (with possible optimization) via the .fit_transform() function | Imputer |
Predictors | - Capable of making predictions. - The .predict() method takes a dataset of new instances and returns a dataset of corresponding predictions.- A predictor also has a .score() method that measures the quality of predictions, given a test set (w/ labels, in case of supervised learning). |
Having learnt the above, now try implementing an [[Imputing Data using Scikit-Learn|imputing strategy in Scikit-Learn]].
Inspection
- Estimator’s hyperparameters are accessible directly via public instance variables, e.g.
imputer.strategy
- Estimator’s learned parameters are accessible in a similar fashion but with an
_
(underscore) suffix, e.g.imputer.statistics_
Nonproliferation of Classes
- All outputs given in the form of
numpy
arrays orscipy
sparse matrices. - Hyperparameters are regular python
string
ornumbers
Composition
Existing building blocks
- Can be reused
- Can be combined to create a
Pipeline
(arbitrary sequence of transformations followed by a final estimator)
Sensible defaults
- Most common defaults
- Baseline system is quick to create
Models
Classifiers
Model | |||
---|---|---|---|
sgd | Stochastic Gradient Descent | uses .decision_function() | Multiple |
forest | Random Forest Classifier | uses .predict_proba() | Multiple |
Naive Bayes | Multiple | ||
svc | Support Vector Machine Classifier | uses .decision_function() | Binary w/ OvO strategy |