n (number of experts)

Appears in 1 paper

Total number of expert networks in one MoE layer.

As used in Paper 09 — Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer →

Total number of expert networks in one MoE layer. The paper experiments with values from 4 to 131,072. In practice, 8–2,048 experts per layer is common. Total model parameters scale with n (more experts = more parameters) while per-token compute scales with k (the number of active experts, not total experts).