Time Series Data Leakage

Why random CV fails with temporal data

Random Cross-Validation

Wrong

Randomly splits data, ignoring time order. Future data leaks into training.

Fold 1 - Test on random samples
Fold 2 - Test on random samples
Fold 3 - Test on random samples
2020 2021 2022 2023 2024
⚠️ Training on 2024 data to predict 2021 = information from the future!

Forward Chaining CV

Correct

Preserves temporal order. Training only uses past data.

Fold 1 - Train on past, test on future
Fold 2 - Expanding window
Fold 3 - Always forward in time
2020 2021 2022 2023 2024
Training Data
Test Data
Data Leakage (Future → Past)