Hardware-Aware Algorithm

Appears in 1 paper

An algorithm designed with GPU memory hierarchy in mind.

As used in Paper 21 — Mamba: Linear-Time Sequence Modeling with Selective State Spaces →

An algorithm designed with GPU memory hierarchy in mind. Mamba's forward pass uses specific memory access patterns to be cache-efficient. Doesn't just call PyTorch ops; requires custom CUDA kernels for good performance.

Paper 21 — Mamba: Linear-Time Sequence Modeling with Selective State Spaces →

Appears in papers