Logit
A raw, unnormalised score before softmax.
A raw, unnormalised score before softmax. In attention, Q · Kᵀ / √dₖ is a matrix of logits. After softmax they become attention weights (probabilities).
A raw, unnormalised score before softmax.
A raw, unnormalised score before softmax. In attention, Q · Kᵀ / √dₖ is a matrix of logits. After softmax they become attention weights (probabilities).