Skip to content

L57 Linear Regression for TS

Advantages of classical ML models

  • Speed: Fast training and prediction, suitable for low-resource environments
  • Interpretability
    • DT and Linear Regression are easy to explain
  • Flexibility

Challengers

  • Feature Engineering Dependency
    • choose appropriate features carefully
    • performance depends on quality of features
  • Data stationarity
    • models assume stationarity
    • preprocess → differencing
  • Scalability
    • some models struggle with large datasets (k-NN for example)

Linear Regression

\[ y_{t} = \beta_{0} + \beta_{1}x_{t-1} + \beta_{2}x_{t-2} + \dots + e_{t} \]
  • Applications
    • Trend estimation
    • Capturing seasonal effects (dummy variables / engineered features)
    • LR models form the Baseline models for forecasting

Feature Engineering

  1. Lagged Features
    • \((y_{t-1},y_{t-2})\)
    • Temperature and yesterday's temperature
  2. Rolling Statistics
    • Add moving averages or rolling standard deviations as predictions
    • Average of the past 7 days to predict the next day
  3. Seasonality Indicators
    • Encode day-of-week, month or holiday information as dummy variables

Advantages

  • Interpretability (clear insights)
  • Efficiency (computationally inexpensive)

Limitations

  • Assumes linearity
    • Cannot model non-linear relationship effectively
  • Sensitive to multicollinearity between features (overlapping lags)

Example

  • Forecasting Stock prices
    • Predict next day's closing prices
    • Features
      • Previous closing prices
      • rolling average (5-day or 10-day moving averages)
      • volume traded
      • relative strength index
      • moving average convergence divergence

SVM

  • Aim: find a hyperplane that best
  • SVC: separates data in feature space (for classification) or,
  • SVR: Fits the data within a margin of tolerance (for regression)
    • penalizing points which lie outside the threshold, minimizing error
    • Optimizes the margin of tolerance (\(\epsilon\)) around the hyperplane

Applications

  • TS for forecasting (regression)
  • Anomaly detection or regime change prediction (classification)

Ensemble Learning

  • SMVs rely on kernel functions to model non-linear relationships
    • Linear Kernel
    • polynomial Kernel
    • RBF Kernel (for highly non-linear relationships)

Feature Engineering

  • Lagged Features
  • Seasonality Features
    • Encode periodic patterns
  • Normalizations
    • Scale features to ensure SVM works efficiently (standardization)

Advantages

  • Handles non-linear relationships via kernels
  • robust to outliers when using appropriate margin parameters

Limitations

  • Computationally intensive: large datasets (quadratic complexity)
  • Requires careful parameter tuning
    • \(\epsilon\)
    • kernel type
    • regularization parameter \(C\)

Random Forests

  • Combine multiple decision trees on random subsets of data and aggregate their predictions

    • average fore regression
    • majority-voting for classification
  • Advantages

    • non-linear relationships
    • robust to overfitting when many trees are used
    • no extensive preprocessing/normalization required
  • Limitations
    • Struggles with extrapolation (forecast outside training data range)
    • Computationally expensive for large datasets/many trees
    • less interpretable with increasing complexity
  • Hyperparameters
    • Number of trees
    • maximum depth
    • number of features
Feature Linear Regression SVM (SVR) Random Forests
Complexity Low Medium-High Medium
Interpretability High Medium Low-Medium
Non-linearity Poor Excellent (with kernels) Excellent
Feature Engineering Critical Important Minimal
Computational Cost Low High Medium
Handling Large Data Excellent Poor Good