Softmax

Appears in 5 papers

A function that turns a vector of raw scores into a probability

As used in Paper 05 — Efficient Estimation of Word Representations in Vector Space (Word2Vec) →

A function that turns a vector of raw scores into a probability

As used in Paper 06 — Sequence to Sequence Learning with Neural Networks →

A function that turns a vector of raw scores into a probability

As used in Paper 07 — Neural Machine Translation by Jointly Learning to Align and Translate →

The function that converts raw alignment scores into attention weights. softmax(eᵢ) = exp(eᵢ) / Σⱼ exp(eⱼ). Guarantees all outputs are positive and sum to 1. See the Softmax Function tutorial.

As used in Paper 12 — Language Models are Few-Shot Learners →

A function that converts a vector of scores into a probability distribution (non-negative numbers summing to 1). Each element represents the probability of a token being the next one to generate.

As used in Paper 20 — Gemini: A Family of Highly Capable Multimodal Models →

A function that converts raw scores into probabilities: softmax(x_i) = exp(x_i) / Σ_k exp(x_k). Used in attention to ensure weights sum to 1.