Value (V)
The "what I send when selected" projection of each position.
The "what I send when selected" projection of each position. V = X · W^V. The output of attention is a weighted sum of value vectors, weighted by attention weights. Separating Keys (for matching) from Values (for content) lets the model learn different representations for these two roles.