Gradient Checkpointing
A memory optimisation technique where intermediate activations are discarded during forward pass and recomputed during backward pass.
A memory optimisation technique where intermediate activations are discarded during forward pass and recomputed during backward pass. Reduces memory at the cost of extra computation. Complements Ring Attention for training.