Alignment / Aligning Language Models
Making language models behave in accordance with human values and preferences.
Making language models behave in accordance with human values and preferences. A model is aligned if it does what users want (helpful, harmless, honest), rather than just predicting the next token on internet text. InstructGPT demonstrates that alignment significantly improves usability compared to raw capability (GPT-3).