DistilBERT
A compressed version of BERT created by knowledge distillation (training a small model to mimic the outputs of a larger one).
A compressed version of BERT created by knowledge distillation (training a small model to mimic the outputs of a larger one). 40% smaller, 60% faster, retains 97% of BERT-base's performance.