Skip to content

Linear Regression & Classical ML in TS

Traditional Machine Learning (ML) models offer a bridge between basic statistical forecasting and complex deep learning. They provide high speed and interpretability, making them ideal for the "baseline model" stage of any time series project.

Advantages of Classical ML Models

  • Speed: Characterized by fast training and prediction times; perfectly suitable for low-resource environments.
  • Interpretability: Models like Decision Trees and Linear Regression are highly transparent and easy to explain to stakeholders.
  • Flexibility: Can handle varied data types and structures with proper preprocessing.

Challenges

  • Feature Engineering Dependency: You must choose appropriate features carefully, as model performance is directly tied to the quality of engineered features.
  • Data Stationarity: Most classical models assume stationarity; significant preprocessing, such as differencing, is usually required.
  • Scalability: Certain models, like k-NN, struggle to perform efficiently with very large datasets.

Linear Regression

Linear Regression (LR) in a time series context typically involves regressing the current value on its own lags or external predictors.

\[y_{t} = \beta_{0} + \beta_{1}x_{t-1} + \beta_{2}x_{t-2} + \dots + e_{t}\]
  • Applications:
    • Trend estimation.
    • Capturing seasonal effects through engineered features or dummy variables.
    • Serving as the Baseline model for comparative forecasting performance.

Feature Engineering Strategies

  1. Lagged Features: Using past values like \((y_{t-1}, y_{t-2})\) as inputs (e.g., using yesterday's temperature to predict today's).
  2. Rolling Statistics: Adding moving averages or rolling standard deviations. For example, using the average of the past 7 days to predict the next.
  3. Seasonality Indicators: Encoding calendar information (day-of-week, month, holidays) as dummy variables.

Limitations

LR assumes a strictly linear relationship and is highly sensitive to multicollinearity between features, which is common when using multiple overlapping lags.


Support Vector Machines (SVM)

The aim of an SVM is to find a hyperplane that best separates or fits the data in high-dimensional space.
- SVC (Classification): Separates data into different regimes or categories.
- SVR (Regression): Fits the data within a specified margin of tolerance (\(\epsilon\)). It penalizes only the points that lie outside this threshold to minimize error.

Key Features

  • Kernel Functions: SVMs use kernels to model non-linear relationships:
    • Linear Kernel
    • Polynomial Kernel
    • RBF Kernel (Radial Basis Function): Ideal for highly non-linear data.
  • Robustness: Effective against outliers when using appropriate margin parameters.

Note

Feature engineering for SVM requires Normalization/Standardization to ensure all features are scaled appropriately, as SVM is distance-based.


Random Forests

Random Forests combine multiple decision trees built on random subsets of the data and aggregate their results.
- Regression: Uses the average of all tree predictions.
- Classification: Uses majority voting.

Advantages & Limitations

  • Pros: Handles non-linear relationships excellently, is robust to overfitting (with enough trees), and requires minimal preprocessing/normalization compared to SVM.
  • Cons: Struggles significantly with extrapolation (it cannot predict values outside the range of its training data). It is also computationally expensive as the number of trees and depth increase.

Comparative Summary

Feature Linear Regression SVM (SVR) Random Forests
Complexity Low Medium-High Medium
Interpretability High Medium Low-Medium
Non-linearity Poor Excellent (with kernels) Excellent
Feature Engineering Critical Important Minimal
Computational Cost Low High Medium
Handling Large Data Excellent Poor Good