L57 Linear Regression for TS

Feature Engineering Dependency
- choose appropriate features carefully
- performance depends on quality of features
Data stationarity
- models assume stationarity
- preprocess → differencing
Scalability
- some models struggle with large datasets (k-NN for example)

\[ y_{t} = \beta_{0} + \beta_{1}x_{t-1} + \beta_{2}x_{t-2} + \dots + e_{t} \]

Applications
- Trend estimation
- Capturing seasonal effects (dummy variables / engineered features)
- LR models form the Baseline models for forecasting

Lagged Features
- \((y_{t-1},y_{t-2})\)
- Temperature and yesterday's temperature
Rolling Statistics
- Add moving averages or rolling standard deviations as predictions
- Average of the past 7 days to predict the next day
Seasonality Indicators
- Encode day-of-week, month or holiday information as dummy variables

Aim: find a hyperplane that best
SVC: separates data in feature space (for classification) or,
SVR: Fits the data within a margin of tolerance (for regression)
- penalizing points which lie outside the threshold, minimizing error
- Optimizes the margin of tolerance (\(\epsilon\)) around the hyperplane

SMVs rely on kernel functions to model non-linear relationships
- Linear Kernel
- polynomial Kernel
- RBF Kernel (for highly non-linear relationships)

Lagged Features
Seasonality Features
- Encode periodic patterns
Normalizations
- Scale features to ensure SVM works efficiently (standardization)

Computationally intensive: large datasets (quadratic complexity)
Requires careful parameter tuning
- \(\epsilon\)
- kernel type
- regularization parameter \(C\)

Combine multiple decision trees on random subsets of data and aggregate their predictions
- average fore regression
- majority-voting for classification
Advantages
- non-linear relationships
- robust to overfitting when many trees are used
- no extensive preprocessing/normalization required
Limitations
- Struggles with extrapolation (forecast outside training data range)
- Computationally expensive for large datasets/many trees
- less interpretable with increasing complexity
Hyperparameters
- Number of trees
- maximum depth
- number of features

Feature	Linear Regression	SVM (SVR)	Random Forests
Complexity	Low	Medium-High	Medium
Interpretability	High	Medium	Low-Medium
Non-linearity	Poor	Excellent (with kernels)	Excellent
Feature Engineering	Critical	Important	Minimal
Computational Cost	Low	High	Medium
Handling Large Data	Excellent	Poor	Good