GLUE benchmark

Appears in 1 paper

General Language Understanding Evaluation.

As used in Paper 11 — BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding →

General Language Understanding Evaluation. A suite of 9 NLP tasks (sentiment, inference, question answering, etc.) used to measure general language understanding. BERT-large scored 80.5 on GLUE at publication, a large jump over the previous state-of-the-art of ~69.