Conventions

  • Order
    • Always X then y, input then output.
    • Always y_test then y_pred / y_probs, known and then unknown…
  • Model
    • sklean takes values for most of its model fitting, so use .values which returns the ndarray of the column we are looking at.
    • Visualize it as \(X = \begin{bmatrix}x_{1} \\ x_{2} \\ x_{3} \\ \end{bmatrix}\) and \(Y = \begin{bmatrix}y_{1} & y_{2} & y_{3}\end{bmatrix}\) where \(x_{i}\) are \(\begin{bmatrix}x_{i1} & x_{i 2} & \dots & x_{i n}\end{bmatrix}\). So a caveat would be if we are using \(x_{1}\) as scalars (with only one dimension). In such a case we will need to pass \(\begin{bmatrix}[x_{1}] \\ [x_{2}] \\ [x_{3}]\end{bmatrix}\) instead of just \(\begin{bmatrix}x_{1} & x_{2} & x_{3}\end{bmatrix}\) thus we need to use the reshape method and reshape it like so X.reshape(-1, 1) to tell pandas that \(X\) should have only 1 column, and -1 tells that any number of observations as required.