Self-attention
Attention where Q, K, and V all come from the same sequence.
Attention where Q, K, and V all come from the same sequence. Every position attends to every other position in the same sequence. Enables direct modelling of within-sequence relationships (e.g., a pronoun attending to its antecedent) without the long information path of RNNs.