n (number of experts)
Total number of expert networks in one MoE layer.
Total number of expert networks in one MoE layer. The paper experiments with values from 4 to 131,072. In practice, 8–2,048 experts per layer is common. Total model parameters scale with n (more experts = more parameters) while per-token compute scales with k (the number of active experts, not total experts).