Sigmoid

Appears in 1 paper

The activation function σ(z) = 1/(1+e⁻ᶻ).

As used in Paper 03 — Learning Representations by Back-propagating Errors →

The activation function σ(z) = 1/(1+e⁻ᶻ). It squashes any real number to a value strictly between 0 and 1. Key property: it is differentiable everywhere, allowing the chain rule to flow through it. Key problem: its derivative is always ≤ 0.25, causing the vanishing gradient problem in deep networks.