L57 Linear Regression for TS
Advantages of classical ML models¶
- Speed: Fast training and prediction, suitable for low-resource environments
- Interpretability
- DT and Linear Regression are easy to explain
- Flexibility
Challengers¶
- Feature Engineering Dependency
- choose appropriate features carefully
- performance depends on quality of features
- Data stationarity
- models assume stationarity
- preprocess → differencing
- Scalability
- some models struggle with large datasets (k-NN for example)
Linear Regression¶
- Applications
- Trend estimation
- Capturing seasonal effects (dummy variables / engineered features)
- LR models form the Baseline models for forecasting
Feature Engineering¶
- Lagged Features
- \((y_{t-1},y_{t-2})\)
- Temperature and yesterday's temperature
- Rolling Statistics
- Add moving averages or rolling standard deviations as predictions
- Average of the past 7 days to predict the next day
- Seasonality Indicators
- Encode day-of-week, month or holiday information as dummy variables
Advantages¶
- Interpretability (clear insights)
- Efficiency (computationally inexpensive)
Limitations¶
- Assumes linearity
- Cannot model non-linear relationship effectively
- Sensitive to multicollinearity between features (overlapping lags)
Example¶
- Forecasting Stock prices
- Predict next day's closing prices
- Features
- Previous closing prices
- rolling average (5-day or 10-day moving averages)
- volume traded
- relative strength index
- moving average convergence divergence
SVM¶
- Aim: find a hyperplane that best
- SVC: separates data in feature space (for classification) or,
- SVR: Fits the data within a margin of tolerance (for regression)
- penalizing points which lie outside the threshold, minimizing error
- Optimizes the margin of tolerance (\(\epsilon\)) around the hyperplane
Applications¶
- TS for forecasting (regression)
- Anomaly detection or regime change prediction (classification)
Ensemble Learning¶
- SMVs rely on kernel functions to model non-linear relationships
- Linear Kernel
- polynomial Kernel
- RBF Kernel (for highly non-linear relationships)
Feature Engineering¶
- Lagged Features
- Seasonality Features
- Encode periodic patterns
- Normalizations
- Scale features to ensure SVM works efficiently (standardization)
Advantages¶
- Handles non-linear relationships via kernels
- robust to outliers when using appropriate margin parameters
Limitations¶
- Computationally intensive: large datasets (quadratic complexity)
- Requires careful parameter tuning
- \(\epsilon\)
- kernel type
- regularization parameter \(C\)
Random Forests¶
-
Combine multiple decision trees on random subsets of data and aggregate their predictions
- average fore regression
- majority-voting for classification
-
Advantages
- non-linear relationships
- robust to overfitting when many trees are used
- no extensive preprocessing/normalization required
- Limitations
- Struggles with extrapolation (forecast outside training data range)
- Computationally expensive for large datasets/many trees
- less interpretable with increasing complexity
- Hyperparameters
- Number of trees
- maximum depth
- number of features
| Feature | Linear Regression | SVM (SVR) | Random Forests |
|---|---|---|---|
| Complexity | Low | Medium-High | Medium |
| Interpretability | High | Medium | Low-Medium |
| Non-linearity | Poor | Excellent (with kernels) | Excellent |
| Feature Engineering | Critical | Important | Minimal |
| Computational Cost | Low | High | Medium |
| Handling Large Data | Excellent | Poor | Good |