Compute Budget (C)
The total computational resources available for training, measured in FLOPs (floating point operations).
The total computational resources available for training, measured in FLOPs (floating point operations). A relationship: C ≈ 6 N D, meaning the compute needed is roughly 6 operations per parameter per token.