Alignment / Aligning Language Models

Appears in 1 paper

Making language models behave in accordance with human values and preferences.

As used in Paper 15 — Training Language Models to Follow Instructions with Human Feedback →

Making language models behave in accordance with human values and preferences. A model is aligned if it does what users want (helpful, harmless, honest), rather than just predicting the next token on internet text. InstructGPT demonstrates that alignment significantly improves usability compared to raw capability (GPT-3).

Paper 15 — Training Language Models to Follow Instructions with Human Feedback →

Appears in papers