Paper 03
Intermediate

Learning Representations by Back-propagating Errors

The Perceptron could learn, but only simple patterns. Multi-layer networks could learn complex patterns, but nobody knew how to train them. This paper answered that question — with a single elegant algorithm that is still the beating heart of every neural network trained today.

Learning Representations by Back-propagating Errors

David Rumelhart, Geoffrey Hinton, Ronald Williams · 1986 · Nature


“We describe a new learning procedure, back-propagation, for networks of neurone-like units.” — Opening of the paper

The Perceptron ended in a crisis.

Minsky and Papert had proved in 1969 that single-layer networks were fundamentally limited. Multi-layer networks could solve those limitations — but nobody knew how to train them. The weights in the hidden layers seemed unreachable. Credit assignment was impossible. The first AI winter set in.

For seventeen years, this problem sat unsolved.

Then in 1986, David Rumelhart, Geoffrey Hinton, and Ronald Williams published a four-page paper in Nature — one of the most prestigious scientific journals in the world — showing that the solution had been under their noses all along. It was the chain rule of calculus, applied backwards through the network.

They called it backpropagation. It ended the first AI winter and started the modern era of neural networks.

Every neural network trained in the world today — every GPT, every image classifier, every speech recogniser — is trained using this algorithm or a direct descendant of it.


What is in this paper?

SectionWhat you will learn
Historical ContextThe AI winter, why hidden layers were stuck, who solved it
The ProblemCredit assignment — how do you blame a hidden neuron for a mistake?
The Core IdeaPropagate the error backwards through the network using the chain rule
How It WorksForward pass → compute loss → backward pass → update weights, step by step
The MathematicsDerivatives, chain rule, gradient descent — the full equations
The CodeImplement backpropagation from scratch in NumPy
Why It MatteredThe end of AI winter, deep learning, every modern AI product
LimitationsVanishing gradients, local minima, computational cost
What Came NextLSTMs solve the vanishing gradient; the road to modern deep learning

Paper at a glance


Start reading

Begin with Historical Context →


Previous paper: The Perceptron (1958) ← Next paper: LSTM (1997) →

Discussion

Questions about this paper? Spotted something unclear? Start a discussion below — powered by GitHub, no separate account needed.