Value (V)

Appears in 1 paper

The "what I send when selected" projection of each position.

As used in Paper 08 — Attention Is All You Need →

The "what I send when selected" projection of each position. V = X · W^V. The output of attention is a weighted sum of value vectors, weighted by attention weights. Separating Keys (for matching) from Values (for content) lets the model learn different representations for these two roles.

Paper 08 — Attention Is All You Need →

Appears in papers