Expert
One of n specialised feed-forward networks in an MoE layer.
One of n specialised feed-forward networks in an MoE layer. Each expert is a standard two-layer MLP (identical structure to the Transformer FFN) with its own learned weight matrices. Experts do not share weights — each learns to specialise in different types of inputs through training and the routing pressure of the gating network.