Time Series Forecasting with LSTM Networks

Time series data is everywhere — sensor readings, stock prices, weather patterns, server metrics. Traditional statistical methods like ARIMA work well for linear patterns, but when the underlying relationships are complex and non-linear, Long Short-Term Memory (LSTM) networks offer a compelling alternative.

Why LSTMs for Time Series?

LSTMs are a type of recurrent neural network (RNN) designed to learn long-term dependencies. Unlike vanilla RNNs that suffer from the vanishing gradient problem, LSTMs use a gating mechanism to selectively remember or forget information across time steps.

This makes them particularly well-suited for:

Sequences where context from far back matters (e.g., seasonal patterns)
Multivariate time series where multiple features interact
Data with non-linear and non-stationary characteristics

Data Preparation: The Critical Step

The most common mistake in LSTM-based forecasting is poor data preparation. Here's what I've learned:

Windowing

LSTMs expect input shaped as (samples, timesteps, features). You need to create sliding windows:

import numpy as np

def create_sequences(data, window_size, horizon=1):
    X, y = [], []
    for i in range(len(data) - window_size - horizon + 1):
        X.append(data[i:i + window_size])
        y.append(data[i + window_size:i + window_size + horizon])
    return np.array(X), np.array(y)

# Example: 30-step lookback, 1-step forecast
X, y = create_sequences(scaled_data, window_size=30, horizon=1)

Normalization

Always normalize before windowing. I prefer MinMaxScaler for bounded data and StandardScaler for unbounded:

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(data.reshape(-1, 1))

Train/Test Split

For time series, never shuffle. Always split chronologically:

split_idx = int(len(X) * 0.8)
X_train, X_test = X[:split_idx], X[split_idx:]
y_train, y_test = y[:split_idx], y[split_idx:]

Building the Model

A solid starting architecture:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

model = Sequential([
    LSTM(64, return_sequences=True, input_shape=(30, 1)),
    Dropout(0.2),
    LSTM(32, return_sequences=False),
    Dropout(0.2),
    Dense(16, activation="relu"),
    Dense(1)
])

model.compile(optimizer="adam", loss="mse", metrics=["mae"])

Key Hyperparameters

Window size: Depends on your data's periodicity. For daily data with weekly patterns, try 7-14. For hourly data, try 24-168.
Hidden units: Start with 32-128. More isn't always better — overfitting is common.
Dropout: 0.1-0.3 works well for regularization.
Learning rate: Adam's default (0.001) is usually fine; reduce if training is unstable.

Avoiding Common Pitfalls

Data leakage: Fit your scaler only on training data, then transform test data with the same scaler.
Stationarity assumption: LSTMs can handle non-stationary data, but differencing can still help.
Overfitting: Use early stopping and monitor validation loss.
Multi-step forecasting: Recursive prediction (feeding predictions back) accumulates error. Consider direct multi-output instead.

Evaluation Beyond MSE

MSE and MAE tell part of the story, but also consider:

MAPE for percentage-based error
Directional accuracy — does the model predict the direction of change correctly?
Visual inspection — always plot predictions vs. actuals. Numbers can be misleading.

Conclusion

LSTMs are powerful tools for time series forecasting, but they're not magic. The fundamentals — proper data preparation, careful windowing, appropriate normalization, and honest evaluation — matter more than architectural complexity. Start simple, validate rigorously, and add complexity only when justified by the data.