Softmax
A function that turns a vector of raw scores into a probability
A function that turns a vector of raw scores into a probability
A function that turns a vector of raw scores into a probability
The function that converts raw alignment scores into attention weights. softmax(eᵢ) = exp(eᵢ) / Σⱼ exp(eⱼ). Guarantees all outputs are positive and sum to 1. See the Softmax Function tutorial.
A function that converts a vector of scores into a probability distribution (non-negative numbers summing to 1). Each element represents the probability of a token being the next one to generate.
A function that converts raw scores into probabilities: softmax(x_i) = exp(x_i) / Σ_k exp(x_k). Used in attention to ensure weights sum to 1.
Appears in papers
Paper 05 — Efficient Estimation of Word Representations in Vector Space (Word2Vec) →
Paper 06 — Sequence to Sequence Learning with Neural Networks →
Paper 07 — Neural Machine Translation by Jointly Learning to Align and Translate →
Paper 12 — Language Models are Few-Shot Learners →
Paper 20 — Gemini: A Family of Highly Capable Multimodal Models →