Model Scaling / Emergent Threshold
The observation that CoT prompting's effectiveness depends critically on model size.
The observation that CoT prompting's effectiveness depends critically on model size. At 100B+ parameters, CoT provides massive improvements (17% → 58% on GSM8K). Below 100B, the improvement is minimal (< 2%). This reveals a sharp threshold in model capacity — below it, CoT doesn't help; above it, reasoning capability unlocks.