The Mathematics
This is the first paper on Ainiketan where real mathematics appears. Do not be intimidated. Every symbol is explained below, and there is a worked example with actual numbers so you can verify everything with pen and paper.
Mathematical concepts used in this paper
Concept: Vectors Why needed: The inputs to a Perceptron (pixel values, sensor readings, features) are a list of numbers — which is exactly what a vector is. Thinking of inputs as vectors lets us use compact notation and reason about them geometrically. Where in paper: Every input to the Perceptron is a vector x = [x₁, x₂, …, xₙ] Tutorial: Vectors — Introduction
Concept: Dot Product Why needed: The weighted sum — the core computation of the Perceptron — is exactly the dot product of the weight vector and the input vector. Where in paper: The forward pass: output = sign(w · x − θ) Tutorial: Dot Product
Concept: Probability Basics Why needed: Rosenblatt framed the Perceptron as a probabilistic model. The word “probabilistic” is literally in the paper’s title. He thought of the weights as encoding probabilities of connection strengths in a biological network. Where in paper: Throughout the theoretical framing of the paper Tutorial: Probability Basics
The key equation: the Perceptron’s output
The Perceptron’s prediction is:
ŷ = 1 if (w₁x₁ + w₂x₂ + ... + wₙxₙ) ≥ θ
ŷ = 0 if (w₁x₁ + w₂x₂ + ... + wₙxₙ) < θ
Where:
- x₁, x₂, …, xₙ = the input features (numbers describing the example)
- w₁, w₂, …, wₙ = the weights (how important each feature is; these are learned)
- θ (theta) = the threshold (the minimum weighted sum needed to output 1)
- ŷ (y-hat) = the Perceptron’s prediction (0 or 1)
In compact vector notation:
ŷ = 1 if w · x ≥ θ
ŷ = 0 if w · x < θ
Where w · x means the dot product of vectors w and x.
The key equation: the learning rule
When the Perceptron makes a mistake, it updates each weight:
wᵢ ← wᵢ + η × (y − ŷ) × xᵢ
Where:
- wᵢ = the weight being updated (weight for input i)
- η (eta) = the learning rate (a small positive number, e.g. 0.1)
- y = the correct answer (0 or 1, provided in the training data)
- ŷ = the Perceptron’s prediction (0 or 1)
- xᵢ = the value of input i for this example
- (y − ŷ) = the error: +1 if we predicted 0 but answer was 1; −1 if we predicted 1 but answer was 0; 0 if correct
Notice what happens in each case:
- If correct (y = ŷ): error = 0, so wᵢ ← wᵢ + 0 = wᵢ. No change. ✓
- If false negative (y = 1, ŷ = 0): error = +1, so wᵢ increases. The Perceptron will be more likely to say 1 next time. ✓
- If false positive (y = 0, ŷ = 1): error = −1, so wᵢ decreases. The Perceptron will be less likely to say 1 next time. ✓
Worked numerical example — full step by step
We train a Perceptron to learn the OR gate: output 1 if either input is 1.
Training data:
| x₁ | x₂ | Correct y |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 1 |
Initial values: w₁ = 0, w₂ = 0, θ = 0.5, η = 0.1
Epoch 1, Example 1: x = [0, 0], y = 0
Weighted sum = (0 × 0) + (0 × 0) = 0
0 < 0.5 → ŷ = 0
Error = y − ŷ = 0 − 0 = 0 → No update
Weights unchanged: w₁ = 0, w₂ = 0
Epoch 1, Example 2: x = [0, 1], y = 1
Weighted sum = (0 × 0) + (0 × 1) = 0
0 < 0.5 → ŷ = 0
Error = 1 − 0 = +1 → UPDATE
w₁ ← 0 + 0.1 × 1 × 0 = 0 (x₁ = 0, so w₁ doesn't change)
w₂ ← 0 + 0.1 × 1 × 1 = 0.1 (x₂ = 1, so w₂ increases)
Weights: w₁ = 0, w₂ = 0.1
Epoch 1, Example 3: x = [1, 0], y = 1
Weighted sum = (0 × 1) + (0.1 × 0) = 0
0 < 0.5 → ŷ = 0
Error = 1 − 0 = +1 → UPDATE
w₁ ← 0 + 0.1 × 1 × 1 = 0.1 (x₁ = 1, so w₁ increases)
w₂ ← 0.1 + 0.1 × 1 × 0 = 0.1 (x₂ = 0, so w₂ unchanged)
Weights: w₁ = 0.1, w₂ = 0.1
Epoch 1, Example 4: x = [1, 1], y = 1
Weighted sum = (0.1 × 1) + (0.1 × 1) = 0.2
0.2 < 0.5 → ŷ = 0
Error = 1 − 0 = +1 → UPDATE
w₁ ← 0.1 + 0.1 × 1 × 1 = 0.2
w₂ ← 0.1 + 0.1 × 1 × 1 = 0.2
Weights: w₁ = 0.2, w₂ = 0.2
After Epoch 1, we still have errors. But the weights have grown from 0 to 0.2. After several more epochs, the weights will reach values like w₁ = 0.6, w₂ = 0.6, at which point:
- (0,0) → sum = 0 < 0.5 → output 0 ✓
- (0,1) → sum = 0.6 ≥ 0.5 → output 1 ✓
- (1,0) → sum = 0.6 ≥ 0.5 → output 1 ✓
- (1,1) → sum = 1.2 ≥ 0.5 → output 1 ✓
The OR gate is learned. Try verifying these by hand on paper.
What the Perceptron Convergence Theorem says
Rosenblatt proved: if the training data is linearly separable, the Perceptron learning rule will always find a set of weights that correctly classifies all training examples, in a finite number of steps.
“Linearly separable” means you can draw a straight line (or, in higher dimensions, a flat plane called a hyperplane) that separates the two classes perfectly.
The AND and OR gates are linearly separable. The XOR gate is not — and that is what broke the Perceptron. We discuss this in Limitations →.
Next: The Code →