Skip to content

Other ML Models for TS

While advanced deep learning captures the headlines, simpler neighbor-based and probabilistic models like KNN and Naive Bayes remain highly effective for specific tasks like pattern recognition and rapid anomaly detection.

k-Nearest Neighbors (KNN)

KNN is a non-parametric method adapted for time series forecasting or classification. Instead of learning a global model, it identifies the most similar historical patterns (neighbors) and uses their known outcomes to predict future values.

Implementation Steps

  1. Supervised Transformation: Convert the time series into a supervised learning problem by "labeling" data points using lagged features.
  2. Similarity Measurement: Calculate the similarity between segments. Usually, each time step (row) is treated as a vector in \(p\)-dimensional space.
    • Euclidean Distance: \(d(x, y) = \sqrt{\sum_{i=1}^{n} (x_i - y_i)^2}\) is the standard metric used to find the "closeness" of patterns.
  3. Prediction: For a given test point, the algorithm finds the \(k\) nearest neighbors from the training set.
    • Regression: Output is the average of the neighbors' values.
    • Classification: Output is the majority class among the neighbors.

Key Components

  • Feature Engineering: Critical for KNN. Includes lagged features (\(y_{t-1}, y_{t-2}\)) and optionally rolling statistics like moving averages to smooth noise.
  • Hyperparameters:
    • \(k\): The number of neighbors. Small \(k\) is sensitive to noise; large \(k\) may blur local patterns.
    • Distance Metric: Euclidean is standard, but Manhattan or Dynamic Time Warping (DTW) can be used for time-shifted series.
  • Normalization: Essential for KNN. Since it is distance-based, features must be scaled to ensure one lag doesn't dominate the distance calculation due to its magnitude.
  • Model Evaluation:
    • Regression: MSE (Mean Squared Error), MAE (Mean Absolute Error).
    • Classification: Precision, Recall, F1-Score, and Accuracy.

Advantages & Limitations

Advantages Limitations
Simplicity: Easy to implement and interpret. Computational Cost: High for large datasets as it must calculate distances to every training point.
Non-parametric: Makes no assumptions about the underlying data distribution. Curse of Dimensionality: Performance degrades as the number of lags (dimensions) increases.
Non-linearity: Naturally handles complex, non-linear patterns. Memory-based: No explicit model is created, making it less efficient for frequent, real-time predictions.

Example: Pattern Recognition

KNN excels at spotting recurring events or anomalies by identifying which subgroup of a time series is similar to a reference pattern.

Brief Steps for Recognition:
1. Define Reference: Identify a specific pattern of interest (e.g., a "head-and-shoulders" trend).
2. Segmentation: Use sliding, non-overlapping windows to capture local sub-sequences.
3. Feature Extraction: Compute summary stats (mean, variance, slopes) or Fourier coefficients to represent the sub-sequences compactly.
4. Training: Classify segments as "Pattern Found" vs. "No Pattern."
5. Testing: Deploy on unseen data to trigger alerts when a neighbor matches the reference.

Other Applications

  • Industrial: Detecting abnormal vibrations or temperature fluctuations in machinery.
  • Finance: Identifying technical chart patterns (cup-and-handle) in stock prices.
  • Healthcare: Recognizing arrhythmias in ECG signals.
  • Meteorology: Detecting specific rainfall patterns in weather data.

Naive Bayes for TS

Naive Bayes addresses classification by applying Bayes' Theorem under the "naive" assumption that lagged values are conditionally independent given the class label.

\[P(Class \mid Lags) = \frac{P(Lags \mid Class) \cdot P(Class)}{P(Lags)}\]
  • Primary Use: Often used for binary classification to identify if a time segment belongs to a "normal" or "anomalous" class.
  • Suitability: Works well for high-dimensional data and serves as a fast, baseline probabilistic classifier, even if the independence assumption is technically violated in sequential data.