Efficient Attention

Appears in 1 paper

Modified attention patterns (sliding window, global, sparse) that reduce computation from O(n²) to O(n log n) or O(n), making long sequences tractable.

As used in Paper 20 — Gemini: A Family of Highly Capable Multimodal Models →

Modified attention patterns (sliding window, global, sparse) that reduce computation from O(n²) to O(n log n) or O(n), making long sequences tractable. Gemini uses a combination for 32K-token context.