Hursh Gupta / Saturday, May 31, 25 Saturday, May 31, 25

Scikit Learn Notes

`sklearn` Legend

import	task
`sklearn.model_selection`	Helps in selecting between models. Also, splitting train-test
`sklearn.impute`	Missing Values
`sklearn.preprocessing`	Change the form of data. Ordinal Categorical Numbers. Make it machine friendly
`sklearn.base`	Access base classes to create custom `estimator`, `transformer`
`sklearn.pipeline`	Create pipelines
`sklearn.compose`	Access to column transformers (take care of categorical and numerical columns in the same transformation)
`sklearn.metrics`	Score functions; Performance metrics; Distance computations
`sklearn.tree`	Decision tree based models
`sklearn.ensemble`	Various ensemble methods (e.g. `RandomForestRegressor`)
`sklearn.svm`	Dedicated for Support Vector Machines???
`sklearn.multiclass`	Access to `OneVsRestClassifier` and `OneVsOneClassifier` to override default multiclass behavior in Binary classifiers
`sklearn.neighbors`	Distance based models

Design Philosophy

Consistency

All objects share a consistent and simple interface

Object	Explanation	Example
Estimators	- Any object that can estimate some parameters based on the estimator - The estimation is done by the `.fit()` method. - Takes only a dataset as a parameter (or 2 (data and labels) for supervised learning algorithms) - Any other parameter is considered as a `hyperparameter` (e.g. strategy)	Imputer
Transformers	- Any object that can transform the dataset. - The transformation is performed by the `.transform()` method, which takes input as a dataset to be transformed - Both `.fit()` and `.transform()` can be conveniently be called together (with possible optimization) via the `.fit_transform()` function	Imputer
Predictors	- Capable of making predictions. - The `.predict()` method takes a dataset of new instances and returns a dataset of corresponding predictions. - A predictor also has a `.score()` method that measures the quality of predictions, given a test set (w/ labels, in case of supervised learning).

Having learnt the above, now try implementing an [[Imputing Data using Scikit-Learn|imputing strategy in Scikit-Learn]].

Inspection

Estimator’s hyperparameters are accessible directly via public instance variables, e.g. imputer.strategy
Estimator’s learned parameters are accessible in a similar fashion but with an _ (underscore) suffix, e.g. imputer.statistics_

Nonproliferation of Classes

All outputs given in the form of numpy arrays or scipy sparse matrices.
Hyperparameters are regular python string or numbers

Composition

Existing building blocks

Can be reused
Can be combined to create a Pipeline (arbitrary sequence of transformations followed by a final estimator)

Sensible defaults

Most common defaults
Baseline system is quick to create

Models

Classifiers

Model
`sgd`	Stochastic Gradient Descent	uses `.decision_function()`	Multiple
`forest`	Random Forest Classifier	uses `.predict_proba()`	Multiple
	Naive Bayes		Multiple
`svc`	Support Vector Machine Classifier	uses `.decision_function()`	Binary w/ OvO strategy

Attributes