Markov Switching Models¶

Markov Switching Models (MSM) represent a powerful class of nonlinear time series models that capture regime-switching behavior. The fundamental difference between MSM and models like TAR or SETAR is the nature of the switch: while TAR/SETAR rely on observable variables to trigger a change, MSM assumes the transition is governed by a hidden (unobservable) Markov process.

Key Features of MSM¶

Regime Switching: The system moves between different states (e.g., bull vs. bear markets).
Markov Property: The current regime depends only on the immediate previous regime.¹
Hidden State: The actual regime at any time $t$ is a latent variable that must be inferred from the data.
Flexibility: Capable of capturing both abrupt and relatively smooth transitions based on the estimated state transition probabilities.

Model Formulation¶

The general form of an MSM with an AR(p) structure in each regime is:

\[y_{t} = \mu_{s_{t}} + \phi_{s_{t},1} y_{t-1} + \dots + \phi_{s_{t},p}y_{t-p} + e_{t}\]

$s_{t}$: The latent state (regime) taking values $\{1, 2, \dots, k\}$.
$\mu_{s_{t}}$: Regime-specific mean.
$\phi_{s_{t},i}$: Regime-specific autoregressive coefficients.
$e_{t}$: Error terms, often with regime-specific variance $\sigma_{s_{t}}^{2}$.

Transition Probabilities¶

The movement between regimes is controlled by a State Transition Probability Matrix ($P$):

\[ P = \begin{bmatrix} p_{11} & \dots & p_{1k} \\ \vdots & \ddots & \vdots \\ p_{k1} & \dots & p_{kk} \\ \end{bmatrix} \]

$p_{ij} = P(s_{t} = j \mid s_{t-1} = i)$ represents the probability of moving from state $i$ to state $j$.
Constraint: Each row must sum to $1$ (since the process must transition to some state).

Estimation & Interpretation¶

Because the states are hidden, we cannot use standard OLS. Instead, we use:

Expectation-Maximization (EM) Algorithm:
- E-step: Calculate the expected values of the hidden states given the current parameters.
- M-step: Update the parameters ($\mu, \phi, \sigma$) to maximize the expected likelihood.
Filtering and Smoothing:
- Hamilton Filter: Estimates the probability of being in a state at time $t$ using only information up to $t$.
- Kim Filter: A backward-looking smoother that improves state estimation using the full dataset.

Other Nonlinear Extensions¶

Beyond MSM, several other models handle complex nonlinearities:

Bilinear Models¶

These incorporate products of past observations and noise to handle specific interactions:
$$y_{t} = \phi_{0} + \sum_{i=1}^{p} \phi_{i}y_{t-i} + \sum_{j=1}^{q} \sum_{k=1}^{p} \beta_{jk}y_{t-k} e_{t-j} + e_{t}$$

Nonlinear ARX (NARX) Models¶

The future value is a nonlinear function of past inputs and outputs:
$$y_{t} = f(y_{t-1}, y_{t-2}, \dots, x_{t-1}, x_{t-2}, \dots) + e_{t}$$
Used frequently in control systems and dynamic engineering modeling.

Neural Network Based Models¶

NAR-NN: Uses a feed-forward network to model the autoregressive relationship.
LSTM: Specifically designed to handle long-term dependencies in sequential data, preventing the "vanishing gradient" problem in deep nonlinear time series.

Hybrid & Specialized Models¶

ST-GARCH: A Smooth Transition GARCH that allows the variance equation itself to switch regimes.
Functional Time Series: Models changes in entire curves or functions over time (e.g., yield curves).
Wavelet Transform Models: Decomposes the series into both time and frequency components simultaneously.

Markov Property: Today's state is dependent only on yesterday's state. ↩