Autoregressive Language Modeling
A training objective where the model learns to predict the next token given all previous tokens.
A training objective where the model learns to predict the next token given all previous tokens. This is causal: you only look backward (not forward). Same as GPT-1; the architecture hasn't changed.