Long Short-Term Memory (LSTM) — Hochreiter & Schmidhuber, 1997

TL;DR

Backpropagation (Paper 03) gave us a way to train deep networks. But when those networks had to read a sequence — a sentence, a song, a stock ticker — the gradient silently died as it travelled back through time. The network could only “remember” the last three or four steps. Anything older was lost.

Hochreiter and Schmidhuber proposed a radical fix: don’t just pass a hidden state forward, also pass a protected memory line called the cell state. Three small neural-network gates — forget, input, output — decide what to erase, what to write, and what to read from this memory. Because the cell state flows almost unchanged from step to step, the gradient survives hundreds of time steps instead of five. LSTMs went on to power the first Google Translate, Siri, and almost every sequence model from 2000 to 2017.

The journey in one line

Deep networks could see → but they couldn’t remember → LSTMs gave them a notebook they could selectively update.

What you will learn

Why a plain RNN forgets — the vanishing gradient, told in pictures.
Why the XOR problem (Paper 02) foreshadowed this failure on sequences.
The cell state — a student’s running notes for the neural network.
The three gates — forget, input, output — and what each one decides.
A worked numerical example: one LSTM step by hand.
A 25-line PyTorch LSTM cell you can run on Google Colab.
Why LSTMs ruled for two decades — and why Transformers eventually replaced them.

Sections

Historical context — 1997, winter of neural nets, RNNs born and broken
The problem — vanishing gradients and the XOR echo
The core idea — a protected memory line + three gates
How it works — one LSTM step, drawn out
The math — equations + a fully worked numeric example
The code — a minimal LSTM cell in PyTorch
Impact — Google Translate, Alexa, DeepMind
Limitations — why it couldn’t scale to GPT-size
What came next — word embeddings, seq2seq, attention

Resources

Glossary — every new term used in this paper
Quiz — 5 questions to test your understanding
Further reading — blogs, videos, original paper