Receptive Field

Appears in 2 papers

In deep networks, the range of input positions that influence a given output position.

As used in Paper 18 — Mistral 7B →

In deep networks, the range of input positions that influence a given output position. With SWA and k layers, Mistral's receptive field grows to approximately k × W tokens. With 32 layers and W=4,096, the receptive field is ~131K tokens, meaning the model can implicitly see information from 131K tokens back even though each layer only explicitly attends to 4K.

As used in Paper 19 — Ring Attention with Blockwise Transformers for Near-Infinite Context →

In Ring Attention, the range of input tokens that influence an output position. Unlike Mistral's SWA, Ring Attention maintains a full receptive field (all n tokens) without degradation, even across layers.