ALBERT

Appears in 1 paper

A BERT variant that reduces parameters by factorising the embedding matrix and sharing weights across Transformer layers.

As used in Paper 11 — BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding →

A BERT variant that reduces parameters by factorising the embedding matrix and sharing weights across Transformer layers. ALBERT-xxlarge matches BERT-large with far fewer parameters.