Positional encoding (PE)

Appears in 2 papers

A fixed vector added to each input embedding to inject position information.

As used in Paper 08 — Attention Is All You Need →

A fixed vector added to each input embedding to inject position information. The original paper uses: PE(pos, 2i) = sin(pos / 10000^(2i/d_model)) and PE(pos, 2i+1) = cos(...). Each position gets a unique vector; nearby positions have similar vectors. Added once before the first layer; not learned.

As used in Paper 20 — Gemini: A Family of Highly Capable Multimodal Models →

A learned or fixed representation of token position in the sequence. Used so the Transformer knows order (position 0 before position 1). Formula: pos[i, j] = sin(i/10000^(2j/d)) or cos(...).

Paper 08 — Attention Is All You Need → Paper 20 — Gemini: A Family of Highly Capable Multimodal Models →

Appears in papers