Masked Language Modelling (MLM)
BERT's primary pre-training objective.
BERT's primary pre-training objective. Randomly replaces 15% of tokens with [MASK] (or a random word, or the original) and trains the model to predict the original tokens using bidirectional context. Analogous to the Cloze test in educational psychology.