Learning Representations by Back-propagating Errors
The Perceptron could learn, but only simple patterns. Multi-layer networks could learn complex patterns, but nobody knew how to train them. This paper answered that question — with a single elegant algorithm that is still the beating heart of every neural network trained today.
Learning Representations by Back-propagating Errors
David Rumelhart, Geoffrey Hinton, Ronald Williams · 1986 · Nature
“We describe a new learning procedure, back-propagation, for networks of neurone-like units.” — Opening of the paper
The Perceptron ended in a crisis.
Minsky and Papert had proved in 1969 that single-layer networks were fundamentally limited. Multi-layer networks could solve those limitations — but nobody knew how to train them. The weights in the hidden layers seemed unreachable. Credit assignment was impossible. The first AI winter set in.
For seventeen years, this problem sat unsolved.
Then in 1986, David Rumelhart, Geoffrey Hinton, and Ronald Williams published a four-page paper in Nature — one of the most prestigious scientific journals in the world — showing that the solution had been under their noses all along. It was the chain rule of calculus, applied backwards through the network.
They called it backpropagation. It ended the first AI winter and started the modern era of neural networks.
Every neural network trained in the world today — every GPT, every image classifier, every speech recogniser — is trained using this algorithm or a direct descendant of it.
What is in this paper?
| Section | What you will learn |
|---|---|
| Historical Context | The AI winter, why hidden layers were stuck, who solved it |
| The Problem | Credit assignment — how do you blame a hidden neuron for a mistake? |
| The Core Idea | Propagate the error backwards through the network using the chain rule |
| How It Works | Forward pass → compute loss → backward pass → update weights, step by step |
| The Mathematics | Derivatives, chain rule, gradient descent — the full equations |
| The Code | Implement backpropagation from scratch in NumPy |
| Why It Mattered | The end of AI winter, deep learning, every modern AI product |
| Limitations | Vanishing gradients, local minima, computational cost |
| What Came Next | LSTMs solve the vanishing gradient; the road to modern deep learning |
Paper at a glance
- Difficulty: Intermediate — requires understanding of derivatives and the chain rule
- Reading time: 55 minutes for all 9 sections
- Math you need: Derivatives · Chain Rule · Partial Derivatives · Gradient Intuition
- Key terms: Backpropagation · Gradient Descent · Loss Function · Hidden Layer
Start reading
Begin with Historical Context →
Previous paper: The Perceptron (1958) ← Next paper: LSTM (1997) →
Discussion
Questions about this paper? Spotted something unclear? Start a discussion below — powered by GitHub, no separate account needed.