[MASK] token

Appears in 1 paper

The special token used to replace selected tokens during MLM pre-training.

As used in Paper 11 — BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding →

The special token used to replace selected tokens during MLM pre-training. The model must predict what the original token was. [MASK] never appears during fine-tuning or inference.