dₖ (key dimension)
The dimension of the Query and Key vectors.
The dimension of the Query and Key vectors. In the original Transformer: dₖ = d_model / h = 512 / 8 = 64. The scaling factor in the attention formula is √dₖ. When dₖ is larger, dot products grow larger in expectation, requiring stronger scaling to keep softmax well-behaved.