Language model
A probability distribution over sequences of tokens.
A probability distribution over sequences of tokens. A language model assigns P(sentence) to any sequence, and can be sampled to generate text. GPT-1 is a conditional language model: P(uₜ | u₁,...,uₜ₋₁).