Dot Product
Dot Product
1. What is this and why do we care?
The dot product is the single most-used operation in all of machine learning.
Every time a neural network computes a weighted sum — and it does this billions of times per second — it is computing a dot product. Every attention score in a Transformer (the architecture behind ChatGPT) is a dot product. Every similarity score between two word embeddings in Word2Vec is a dot product.
Understanding the dot product is not optional. It is the heartbeat of neural networks.
2. Prerequisites
You need to understand vectors first. If you have not read the Vectors — Introduction tutorial, start there.
3. The intuition — before any symbols
Imagine you are a student choosing which college to attend. You care about three things: placement rate, campus quality, and distance from home. Each matters a different amount to you.
You rate each college on all three, and separately note how much you care about each factor:
Your priorities: [placement, campus, distance]
How much you care: [ 0.6, 0.3, 0.1 ] ← weights
College A scores: [0.9, 0.7, 0.5] (great placement, good campus, far away) College B scores: [0.5, 0.9, 0.9] (okay placement, great campus, very close)
To pick the better college for you, you multiply each score by how much that factor matters, then add:
College A: (0.9 × 0.6) + (0.7 × 0.3) + (0.5 × 0.1) = 0.54 + 0.21 + 0.05 = 0.80
College B: (0.5 × 0.6) + (0.9 × 0.3) + (0.9 × 0.1) = 0.30 + 0.27 + 0.09 = 0.66
College A wins for you (0.80 > 0.66).
You just computed two dot products. The dot product is a weighted sum — multiply corresponding elements and add them up. It tells you how well two vectors “align” given what each one contains.
This is exactly how the Perceptron computes its output: weights · inputs = decision score.
4. A tiny worked example with real numbers
Let:
w = [2, 3] (weights)
x = [4, 1] (inputs)
Dot product w · x:
Step 1: Multiply corresponding elements
2 × 4 = 8
3 × 1 = 3
Step 2: Add them up
8 + 3 = 11
Result: w · x = 11
Check: does the formula give the same answer?
w · x = (w₁ × x₁) + (w₂ × x₂) = (2 × 4) + (3 × 1) = 8 + 3 = 11 ✓
5. The general rule
Given two vectors of the same dimension n:
a = [a₁, a₂, ..., aₙ]
b = [b₁, b₂, ..., bₙ]
The dot product is:
a · b = (a₁ × b₁) + (a₂ × b₂) + ... + (aₙ × bₙ)
Written more compactly using the symbol Σ (sigma, meaning “sum of”):
a · b = Σᵢ aᵢ × bᵢ
Important properties:
- The result is always a single number (called a scalar), not a vector
a · b = b · a(order does not matter)- If two vectors are perpendicular (at right angles), their dot product is 0
- If two vectors point in exactly the same direction, their dot product is at its maximum
6. A slightly bigger example
Perceptron weights: w = [0.5, -0.3, 0.8, 0.2]
Input features: x = [1.0, 0.5, 0.0, 1.0]
w · x = (0.5 × 1.0) + (-0.3 × 0.5) + (0.8 × 0.0) + (0.2 × 1.0)
= 0.5 + (-0.15) + 0.0 + 0.2
= 0.55
If the threshold is 0.5, then 0.55 ≥ 0.5 → the Perceptron fires (outputs 1).
Notice: the third input (0.0) contributed nothing — it was zero, so multiplying by its weight gave 0. Only the non-zero inputs “vote.”
7. The geometric meaning: measuring alignment
There is a deeper way to understand the dot product. It measures how much two vectors point in the same direction.
Formally:
a · b = |a| × |b| × cos(θ)
Where:
|a|= the length (magnitude) of vector a|b|= the length of vector bθ(theta) = the angle between the two vectorscos(θ)= cosine of the angle (a number between -1 and +1)
What this means in plain language:
| Situation | cos(θ) | Dot product |
|---|---|---|
| Vectors point same direction | cos(0°) = 1 | Large and positive |
| Vectors are perpendicular | cos(90°) = 0 | Exactly 0 |
| Vectors point opposite directions | cos(180°) = -1 | Large and negative |
This is why the dot product measures similarity. In Word2Vec, “king” and “queen” have similar word vectors — they point in roughly the same direction — so their dot product is high. “King” and “banana” point in very different directions — low dot product.
In attention mechanisms, the dot product between a Query vector and a Key vector measures how relevant that key is to the query. High dot product = high relevance = high attention weight.
8. Where does this appear in AI?
Paper 02 — The Perceptron: weighted_sum = w · x. The Perceptron’s entire forward pass is one dot product followed by a threshold check.
Paper 05 — Word2Vec: Similarity between two words is measured as the dot product of their embedding vectors (or equivalently, cosine similarity, which normalises by vector lengths).
Paper 07 — Attention (Bahdanau): Attention scores are computed as dot products between hidden states. High score = “pay attention to this.”
Paper 08 — Transformer: The attention formula QKᵀ is a matrix of dot products — every Query vector dotted with every Key vector. This is the Q · K that you will see in the most important equation in modern AI.
9. Common mistakes
-
Forgetting that the result is a scalar. The dot product of two vectors is always one number, not a vector. Students sometimes expect a vector back.
-
Adding instead of multiplying first. The dot product is
(a₁ × b₁) + (a₂ × b₂), not(a₁ + b₁) × (a₂ + b₂). Multiply element-by-element first, then sum. -
Applying it to vectors of different sizes.
[1, 2, 3] · [4, 5]is undefined. You can only dot product vectors of the same dimension. In neural networks, mismatched dimensions are one of the most common coding errors.
10. Try it yourself
Exercise 1:
Compute [3, 4] · [1, 2].
Show answer
(3 × 1) + (4 × 2) = 3 + 8 = 11
Exercise 2:
A Perceptron has weights w = [0.4, 0.6] and threshold 0.5.
Input A = [1, 0]. Input B = [0, 1]. Input C = [1, 1].
What does the Perceptron output for each input?
Show answer
Input A: w · [1,0] = (0.4×1) + (0.6×0) = 0.4. 0.4 < 0.5 → output 0
Input B: w · [0,1] = (0.4×0) + (0.6×1) = 0.6. 0.6 ≥ 0.5 → output 1
Input C: w · [1,1] = (0.4×1) + (0.6×1) = 1.0. 1.0 ≥ 0.5 → output 1
Exercise 3: Two word vectors (simplified to 3 dimensions):
- “cat” = [0.9, 0.1, 0.2]
- “dog” = [0.8, 0.2, 0.1]
- “car” = [0.1, 0.9, 0.8]
Compute: cat · dog and cat · car. Which is higher? What does this tell you about similarity?
Show answer
cat · dog = (0.9×0.8) + (0.1×0.2) + (0.2×0.1) = 0.72 + 0.02 + 0.02 = 0.76
cat · car = (0.9×0.1) + (0.1×0.9) + (0.2×0.8) = 0.09 + 0.09 + 0.16 = 0.34
cat · dog is higher (0.76 > 0.34). This says “cat” and “dog” are more similar in meaning than “cat” and “car” — which matches our intuition. Both are animals, both are pets. This is the idea behind Word2Vec.
10. Interactive widget
Coming soon: Dot Product Explorer →
Drag two 2D vectors. Watch the dot product update in real time. See how it reaches maximum when they point the same way, and zero when they are perpendicular.
Previous tutorial: Vectors — Introduction ← Next tutorial: Matrix Multiplication → Used in: Paper 02 — The Perceptron →