Query Head
In multi-head attention, one of n_heads independent attention mechanisms.
In multi-head attention, one of n_heads independent attention mechanisms. Each query head learns to attend to different patterns in the input. In GQA, multiple query heads can share a single KV head, reducing memory without completely losing parallel attention patterns.