Communication Complexity

Appears in 1 paper

The amount of data that must be transferred between GPUs.

As used in Paper 19 — Ring Attention with Blockwise Transformers for Near-Infinite Context →

The amount of data that must be transferred between GPUs. In Ring Attention, each KV chunk circulates P times, resulting in O(n × d) total data per GPU. Well-balanced with compute time if GPUs have sufficient throughput.