Time Series Forecasting with LSTM Networks
How to build and tune LSTM models for time series prediction, with lessons learned from real-world sensor data projects.
Time Series Forecasting with LSTM Networks
Time series data is everywhere — sensor readings, stock prices, weather patterns, server metrics. Traditional statistical methods like ARIMA work well for linear patterns, but when the underlying relationships are complex and non-linear, Long Short-Term Memory (LSTM) networks offer a compelling alternative.
Why LSTMs for Time Series?
LSTMs are a type of recurrent neural network (RNN) designed to learn long-term dependencies. Unlike vanilla RNNs that suffer from the vanishing gradient problem, LSTMs use a gating mechanism to selectively remember or forget information across time steps.
This makes them particularly well-suited for:
- Sequences where context from far back matters (e.g., seasonal patterns)
- Multivariate time series where multiple features interact
- Data with non-linear and non-stationary characteristics
Data Preparation: The Critical Step
The most common mistake in LSTM-based forecasting is poor data preparation. Here's what I've learned:
Windowing
LSTMs expect input shaped as (samples, timesteps, features). You need to create sliding windows:
import numpy as np
def create_sequences(data, window_size, horizon=1):
X, y = [], []
for i in range(len(data) - window_size - horizon + 1):
X.append(data[i:i + window_size])
y.append(data[i + window_size:i + window_size + horizon])
return np.array(X), np.array(y)
# Example: 30-step lookback, 1-step forecast
X, y = create_sequences(scaled_data, window_size=30, horizon=1)
Normalization
Always normalize before windowing. I prefer MinMaxScaler for bounded data and StandardScaler for unbounded:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(data.reshape(-1, 1))
Train/Test Split
For time series, never shuffle. Always split chronologically:
split_idx = int(len(X) * 0.8)
X_train, X_test = X[:split_idx], X[split_idx:]
y_train, y_test = y[:split_idx], y[split_idx:]
Building the Model
A solid starting architecture:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
model = Sequential([
LSTM(64, return_sequences=True, input_shape=(30, 1)),
Dropout(0.2),
LSTM(32, return_sequences=False),
Dropout(0.2),
Dense(16, activation="relu"),
Dense(1)
])
model.compile(optimizer="adam", loss="mse", metrics=["mae"])
Key Hyperparameters
- Window size: Depends on your data's periodicity. For daily data with weekly patterns, try 7-14. For hourly data, try 24-168.
- Hidden units: Start with 32-128. More isn't always better — overfitting is common.
- Dropout: 0.1-0.3 works well for regularization.
- Learning rate: Adam's default (0.001) is usually fine; reduce if training is unstable.
Avoiding Common Pitfalls
- Data leakage: Fit your scaler only on training data, then transform test data with the same scaler.
- Stationarity assumption: LSTMs can handle non-stationary data, but differencing can still help.
- Overfitting: Use early stopping and monitor validation loss.
- Multi-step forecasting: Recursive prediction (feeding predictions back) accumulates error. Consider direct multi-output instead.
Evaluation Beyond MSE
MSE and MAE tell part of the story, but also consider:
- MAPE for percentage-based error
- Directional accuracy — does the model predict the direction of change correctly?
- Visual inspection — always plot predictions vs. actuals. Numbers can be misleading.
Conclusion
LSTMs are powerful tools for time series forecasting, but they're not magic. The fundamentals — proper data preparation, careful windowing, appropriate normalization, and honest evaluation — matter more than architectural complexity. Start simple, validate rigorously, and add complexity only when justified by the data.