Section 03

The Core Idea

First Learning Machine The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain 1958

The Core Idea

Start with a neuron

The brain has about 86 billion neurons. Each neuron is, in simplified form, a simple device: it receives signals from other neurons through connections called synapses, adds them up, and if the total is strong enough, fires its own signal to the next neurons.

The strength of each synapse varies. Some connections are strong — a signal coming through that synapse has a big effect. Some are weak — they barely matter. And crucially, these strengths change over time as we learn. When you practise a skill, you are literally strengthening certain synaptic connections in your brain.

Rosenblatt’s key insight: model this in mathematics.


The artificial neuron

An artificial neuron — the Perceptron — works as follows:

It receives several inputs. Each input is a number. For example, if you are classifying whether a photo contains a cat, the inputs might be the pixel values of the image.

Each input has a weight. The weight is a number that says how important this input is. A high positive weight means: “when this input is high, I am more likely to fire.” A negative weight means: “when this input is high, I am less likely to fire.” A weight near zero means: “this input does not matter much.”

It computes a weighted sum. Multiply each input by its weight, then add all the results together.

It applies a threshold. If the weighted sum is above a threshold value, the neuron outputs 1 (it “fires” — it says “yes, this is a cat”). If the sum is below the threshold, it outputs 0 (it says “no, not a cat”).

That is it. Input × weight, sum, threshold, output. Four steps.


The Indian analogy: the committee vote

Imagine your school is deciding whether to cancel classes due to heavy rain. A committee of five teachers votes.

Each teacher looks at one thing: the weather forecast, the distance students travel, the condition of school roads, past attendance on rainy days, the headmaster’s mood. These are the inputs.

But not all teachers have equal say. The headmaster’s vote counts for 3 points. A junior teacher’s vote counts for 1 point. These are the weights.

The committee adds up the weighted votes. If the total is above a threshold (say, 7 out of 10), they cancel classes. Below 7, classes continue.

The Perceptron is exactly this committee — but with numbers, not teachers. The inputs are the data. The weights are the importance of each input. The threshold is the decision boundary.

The magic is what comes next: the weights are not fixed. The committee can learn which factors matter most, by observing what happened when they made past decisions.


The learning rule — the second key insight

Here is where Rosenblatt went beyond McCulloch and Pitts. He gave his neuron a way to learn.

The learning rule is simple:

  • If the Perceptron makes a correct prediction, do nothing. Leave the weights as they are.
  • If the Perceptron says “yes” but the answer was “no” (false positive): reduce the weights for the inputs that contributed to this wrong decision.
  • If the Perceptron says “no” but the answer was “yes” (false negative): increase the weights for the inputs that contributed to the missed detection.

That is the Perceptron Learning Rule. Adjust the weights in the direction that would have made the right prediction. Repeat over all training examples. Repeat again. And again.

Rosenblatt proved mathematically that if the data is linearly separable (meaning the two classes can be separated by a straight line), the Perceptron is guaranteed to converge to the correct answer in a finite number of steps. This is called the Perceptron Convergence Theorem.


A second analogy: the cricket coach

Imagine a young cricket bowler practising. Each delivery, the coach watches and gives feedback.

If the ball goes where the bowler intended — no change. Good. Do the same thing again.

If the ball swings too far to the off side — the coach says: “Adjust your grip, move your wrist slightly inward.” A specific correction for the inputs (grip, wrist angle) that caused the mistake.

If the ball is a full toss — a different correction for a different error.

Over hundreds of deliveries, the bowler adjusts. The “weights” — the bowl’s specific muscle memory for each parameter of the action — update based on feedback. Eventually, the bowler gets good.

The Perceptron does this with numbers instead of muscles. It adjusts its weights based on feedback (was the prediction right or wrong?) over hundreds or thousands of training examples. Eventually, it gets good.


How this is different from what came before

Before Rosenblatt, a machine had to be explicitly programmed. Every decision had to be coded by a human.

After Rosenblatt, a machine could adjust its own parameters. It could generalise from examples. The programmer’s job shifted from “write all the rules” to “provide good training examples.”

This is the shift from programming to learning — and it is the foundation of everything that follows in this story: deeper networks, backpropagation, and eventually the large language models of today.


Next: How It Works →