Skip to content

Pipeline

Pipeline

  • each step but the last must be a transformer.
steps = [
    ("imputation", SimpleImputer()),
    ("logistic_regression", LogisticRegression())
]
pipeline = Pipeline(steps)
pipeline.fit(X_train, y_train)
pipeline.score(X_test, y_test)

Cross Validation and grid searching

pipeline = Pipeline()
parameters = {"knn__n_neighbors": np.arange(1, 50)}
cv = GridSearchCV(pipeline, param_grid=paramters) # (1)
cv.best_score_
cv.best_params_
  1. We can even pass a pipeline to the Grid Search CV, since there are multiple steps in the pipeline, we need to prefix the parameter with the name of the step ("knn" step has a n_neighbors argument so the param should be called knn__n_neighbors).