Dense computation
The standard approach in neural networks: every parameter fires for every input.
The standard approach in neural networks: every parameter fires for every input. A dense FFN with 1 billion parameters uses all 1 billion for every token. Contrasted with sparse computation, where only a subset of parameters activates per input.