Positional encoding (PE)
A fixed vector added to each input embedding to inject position information.
A fixed vector added to each input embedding to inject position information. The original paper uses: PE(pos, 2i) = sin(pos / 10000^(2i/d_model)) and PE(pos, 2i+1) = cos(...). Each position gets a unique vector; nearby positions have similar vectors. Added once before the first layer; not learned.
A learned or fixed representation of token position in the sequence. Used so the Transformer knows order (position 0 before position 1). Formula: pos[i, j] = sin(i/10000^(2j/d)) or cos(...).