Attention weight (αₜᵢ)
The probability-like number, between 0 and 1, representing how much the decoder at decoding step t focuses on source position i.
The probability-like number, between 0 and 1, representing how much the decoder at decoding step t focuses on source position i. All attention weights for a given step sum to 1. Computed by applying softmax to the raw alignment scores.
A probability-like value between 0 and 1, produced by softmax from the scaled attention scores. Represents how much one position attends to another. All weights for a given query position sum to 1. The matrix of all attention weights is the (T × T) attention weight matrix A.