RoBERTa

Appears in 1 paper

Robustly Optimized BERT Pretraining Approach.

As used in Paper 11 — BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding →

Robustly Optimized BERT Pretraining Approach. Facebook AI's 2019 replication of BERT with more data, no NSP, larger batch sizes, and longer training. Beat BERT-large on all benchmarks, showing the original BERT was significantly undertrained.