Impute
Split the data first! To ensure we are not leaking information about our test set to our model.
- We use different imputation methods on numerical and categorical variables. Thus,
- Create
X_catandX_num. - Create categorical training and test sets.
- Use the same
random_statefor numerical training and test sets. - Use different imputers for
X_train_catandX_train_num
- Create
An imputer is a transformer