Neural Networks for TS¶

Artificial Neural Networks (ANNs) are simple mathematical models inspired by the biological brain. In the context of time series, they are used to approximate complex, non-linear functions that relate past observations to future forecasts.

Network Architecture¶

The network consists of neurons organized into distinct layers:
* Input Layer (Bottom): These are the predictors or lags (e.g., \(y_{t-1}, y_{t-2}\)).
* Hidden Layers (Intermediate): These contain "hidden neurons" that perform non-linear transformations.
* Output Layer (Top): This layer produces the final forecasts.

The structure loosely mimics the way our brain solves problems by passing signals through layers of interconnected nodes.

The "Perceptron"¶

A Perceptron is the simplest form of a neural network node. It is a "feed-forward" system, meaning the signal moves in only one direction—from input to output.
* A neuron receives \(n\) inputs (\(x_{1}, x_{2}, \dots, x_{n}\)).
* Each input is adjusted by a weight.

Weights¶

Weights represent the "importance" of a specific input.

Note

We don't manually tell the network which conditions (like weather or weekdays) are important; it learns the optimal weights for itself from the training data.

Training the Perceptron¶

The weights are updated iteratively to reduce the error between the forecast (\(A\)) and the actual target (\(T\)). For each input \(i\):

\[W(i) = W(i) + a \times g'(\text{sum of inputs}) \times (T-A) \times P(i)\]

Where:
* \(g'\): The derivative of the activation function.
* \(a\): The learning rate (step size of the update).
* \(P(i)\): The input vector.
* \((T-A)\): The error (Target minus Actual).

Activation Functions¶

Activation functions are mathematical equations that determine whether a neuron should "fire" based on its inputs. This is the "secret ingredient" that allows neural networks to model complex, non-linear relationships.

Sigmoid: \(f(x) = \dfrac{1}{1+e^{-x}}\)
- Range: \((0, 1)\).
- Use Case: Binary classification or output layers for probabilities.
Tan-h: \(f(x) = \dfrac{e^x - e^{-x}}{e^x + e^{-x}}\)
- Range: \((-1, 1)\).
- Use Case: Hidden layers where the output needs to be zero-centered.

Hidden Layers & Complexity¶

The number of hidden layers determines the "depth" and complexity of the model.

No Hidden Layers?
- If there are no hidden layers and the output activation is linear, the model reduces to a standard Linear Regression.
Multilayer Feed-forward:
- Inputs to each node are combined using a weighted linear combination and then passed through an activation function.
- Increasing the number of hidden layers allows the network to capture increasingly complicated dynamics, but also increases the risk of overfitting.

Neural Network Autoregression (NNAR)¶

NNAR specifically uses lagged values of a time series as inputs to a feed-forward neural network.
* \(NNAR(p, k)\):
- \(p\): Number of lagged inputs (e.g., \(y_{t-1}, \dots, y_{t-p}\)).
- \(k\): Number of nodes in the single hidden layer.
* Special Case: \(NNAR(p, 0)\) is equivalent to a linear \(AR(p)\) model.

Training Mechanics: Batch, Iterations & Epoch¶

Large datasets cannot be processed by the NN all at once due to memory constraints.

Batch: A subset of the dataset. Batch size is a hyperparameter.
Iterations: The number of batches needed to complete one full pass of the data.
Epoch: One Epoch occurs when the ENTIRE dataset has been passed forward and backward through the network exactly ONCE.

The Training Curve¶

Neural networks use gradient descent to update weights across multiple epochs. As the number of epochs increases, the model transitions through three phases:
1. Underfitting: The model hasn't learned the patterns yet.
2. Optimal: The model has learned the general trends.
3. Overfitting: The model has memorized the noise in the data, leading to poor out-of-sample performance.