MMLU (Massive Multitask Language Understanding)

Appears in 1 paper

A benchmark of 57 diverse academic subjects (history, law, science, medicine) with 14,042 multiple-choice questions.

As used in Paper 20 — Gemini: A Family of Highly Capable Multimodal Models →

A benchmark of 57 diverse academic subjects (history, law, science, medicine) with 14,042 multiple-choice questions. Baseline: 70% random, 86.4% GPT-4, 89.8% human expert, 90.04% Gemini Ultra.