Autoregressive Language Modeling

Appears in 1 paper

A training objective where the model learns to predict the next token given all previous tokens.

As used in Paper 12 — Language Models are Few-Shot Learners →

A training objective where the model learns to predict the next token given all previous tokens. This is causal: you only look backward (not forward). Same as GPT-1; the architecture hasn't changed.

Paper 12 — Language Models are Few-Shot Learners →

Appears in papers