Key (K)
The "advertisement" projection of each position.
The "advertisement" projection of each position. K = X · W^K. Each position's key vector represents what information it contains. Attention scores are computed by comparing queries to keys.
The "advertisement" projection of each position.
The "advertisement" projection of each position. K = X · W^K. Each position's key vector represents what information it contains. Attention scores are computed by comparing queries to keys.