Inference vs Training

Appears in 1 paper

Inference: generating tokens one-by-one (autoregressive).

As used in Paper 19 — Ring Attention with Blockwise Transformers for Near-Infinite Context →

Inference: generating tokens one-by-one (autoregressive). Training: processing full sequences simultaneously (non-autoregressive). Ring Attention applies to both, but inference efficiency is higher due to longer context enabling better predictions.