ALBERT
A BERT variant that reduces parameters by factorising the embedding matrix and sharing weights across Transformer layers.
A BERT variant that reduces parameters by factorising the embedding matrix and sharing weights across Transformer layers. ALBERT-xxlarge matches BERT-large with far fewer parameters.