Expert

Appears in 1 paper

One of n specialised feed-forward networks in an MoE layer.

As used in Paper 09 — Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer →

One of n specialised feed-forward networks in an MoE layer. Each expert is a standard two-layer MLP (identical structure to the Transformer FFN) with its own learned weight matrices. Experts do not share weights — each learns to specialise in different types of inputs through training and the routing pressure of the gating network.