Vectors — Introduction

1. What is this and why do we care?

When a neural network processes your words, it turns each word into a list of numbers — a vector. When the Perceptron reads an image, each pixel’s brightness is a number in a vector. When an attention mechanism decides which words are related to which, it computes dot products between vectors.

Vectors are the universal language of machine learning. Every input, every output, every intermediate representation inside a neural network is a vector. If you understand vectors, you understand the shape of everything happening inside an AI model.

2. Prerequisites

None. This is the starting point. You need only basic arithmetic — addition and multiplication of numbers.

3. The intuition — before any symbols

Imagine you are describing a mango to someone who cannot see it. You might say: “It is about 12 centimetres long, 8 centimetres wide, weighs 200 grams, and is 70% yellow with 30% green.”

You have just described a mango using four numbers: [12, 8, 200, 0.7].

That list of numbers is a vector. Each number captures one attribute — one dimension of information — about the thing you are describing.

Now imagine your friend has a second mango: [11, 7, 180, 0.9]. You immediately know this mango is slightly smaller, lighter, and more yellow. Without seeing either mango, you can compare them just by comparing their numbers.

This is what vectors do in machine learning. They represent things as lists of numbers so that a computer can compare them, combine them, and reason about them mathematically.

In AI:

A word might be represented as a vector of 300 numbers, where each number captures something about the word’s meaning
An image might be a vector of thousands of pixel brightness values
A patient’s medical record might be a vector with entries for age, blood pressure, glucose level, etc.

The key point: a vector is just a list of numbers, where the order matters.

4. A tiny worked example with real numbers

Let us say two students, Priya and Arjun, are described by two attributes: their maths score and their science score.

Priya's vector:  p = [85, 72]
                       ↑    ↑
                     maths  science

Arjun's vector:  a = [60, 90]

These are 2-dimensional vectors. We can plot them:

Science
100 |          Arjun (60, 90)
 90 |          •
 80 |
 70 |   Priya (85, 72)
    |   •
 60 |
    +--+--+--+--+--+--→ Maths
   0  60  70  80  90

Each student is a point in 2D space. The vector is the address of that point.

Adding two vectors: Add corresponding numbers.

p + a = [85 + 60, 72 + 90] = [145, 162]

This gives the “combined score” vector — not very meaningful here, but vector addition is used constantly in neural networks.

Scaling a vector (multiplying by a number):

2 × p = [2 × 85, 2 × 72] = [170, 144]

This doubles every component. The vector points in the same direction but is twice as long.

5. The general rule

A vector with n components is written:

v = [v₁, v₂, v₃, ..., vₙ]

n is called the dimension of the vector
Each vᵢ is a single number called a component or element

Adding two vectors (they must have the same dimension):

[a₁, a₂, ..., aₙ] + [b₁, b₂, ..., bₙ] = [a₁+b₁, a₂+b₂, ..., aₙ+bₙ]

Scaling a vector by a number c:

c × [v₁, v₂, ..., vₙ] = [c×v₁, c×v₂, ..., c×vₙ]

Check: does the general rule give the same answer as our worked example above?

2 × [85, 72] = [2×85, 2×72] = [170, 144] ✓

6. A slightly bigger example

A school tracks 4 things about each student: maths score, science score, attendance percentage, and homework completion percentage.

Student Riya:   r = [78, 85, 95, 80]
Student Suresh: s = [92, 70, 60, 55]

These are 4-dimensional vectors. We cannot plot them easily (we would need 4D space) but we can still add and scale them:

r + s = [78+92, 85+70, 95+60, 80+55] = [170, 155, 155, 135]

Average of r and s = 0.5 × (r + s) = [85, 77.5, 77.5, 67.5]

This average vector captures the “typical student” between Riya and Suresh.

In neural networks, taking averages and weighted sums of vectors happens in almost every layer.

7. Where does this appear in AI?

Paper 02 — The Perceptron: The inputs to the Perceptron are a vector x = [x₁, x₂, ..., xₙ] and the weights are a vector w = [w₁, w₂, ..., wₙ]. The forward pass computes a weighted sum of these — which is the dot product w · x. Without understanding vectors, you cannot understand the Perceptron.

Paper 05 — Word2Vec: Each word in the English language is mapped to a vector of 300 numbers. Words with similar meanings have vectors that point in similar directions. The famous example: king - man + woman ≈ queen is vector arithmetic — adding and subtracting meaning-vectors.

Paper 08 — Transformer: The Query, Key, and Value matrices in attention are collections of vectors — one per word in the input sequence. The entire attention mechanism is vector operations: dot products, scaled sums, weighted combinations.

8. Common mistakes

Confusing dimension with size. A 300-dimensional word vector is not “big” in the sense of storage — it is just a list of 300 numbers. The word “dimension” means the number of components.
Forgetting that order matters. [85, 72] and [72, 85] are different vectors. The first means “85 in maths, 72 in science.” The second means “72 in maths, 85 in science.” The numbers are the same but the meaning is different.
Trying to add vectors of different sizes. You cannot add [85, 72] and [60, 90, 75] — they have different dimensions. In neural networks, all vectors in a given layer must have the same dimension.

9. Try it yourself

Exercise 1: Priya’s mark vector is [85, 72, 88] (maths, science, English). Arjun’s mark vector is [60, 90, 75]. Compute their average mark vector.

Show answer

Average = 0.5 × ([85, 72, 88] + [60, 90, 75]) = 0.5 × [145, 162, 163] = [72.5, 81, 81.5]

Exercise 2: A word embedding maps words to 2D vectors (simplified):

“Delhi” = [0.8, 0.2]
“Mumbai” = [0.7, 0.3]
“village” = [0.1, 0.9]

Which two words are more similar to each other — Delhi and Mumbai, or Delhi and village? (Hint: compare the numbers directly. Similar vectors have similar numbers.)

Show answer

Delhi [0.8, 0.2] and Mumbai [0.7, 0.3] are more similar — their numbers are close to each other. Both have high first component (perhaps representing “city-ness”) and low second component. Delhi and village are very different — first components are 0.8 vs 0.1, second components are 0.2 vs 0.9.

Exercise 3: In a Perceptron, weights are [0.5, 0.3, 0.2] and inputs are [1, 0, 1]. What is the weighted sum? (Hint: multiply corresponding components and add them up. This is called the dot product — covered in the next tutorial.)

Show answer

Weighted sum = (0.5 × 1) + (0.3 × 0) + (0.2 × 1) = 0.5 + 0 + 0.2 = 0.7

Coming soon: Vector Explorer Playground →

Drag two vectors around a 2D grid. See their sum. Scale them. Feel the geometry before the algebra.

Next tutorial: Dot Product → Used in: Paper 02 — The Perceptron →

Vectors — Introduction

1. What is this and why do we care?

2. Prerequisites

3. The intuition — before any symbols

4. A tiny worked example with real numbers

5. The general rule

6. A slightly bigger example

7. Where does this appear in AI?

8. Common mistakes

9. Try it yourself

10. Interactive widget