Stratified Sampling

Preserving class distribution in train-test splits

Original Dataset (1000 samples)
Class A: 80% (800)
Class B: 20% (200)

Random Sampling

Problem

Random selection can skew class distribution in splits

Training (75%)
--%A / --%B
Test (25%)
--%A / --%B
Class Distribution Variance
Train
Test

Stratified Sampling

Solution

Maintains original class proportions in all splits

Training (75%)
80%A / 20%B
Test (25%)
80%A / 20%B
Class Distribution (Preserved)
Train
Test