Generalization / Generalizing to New Domains

Appears in 1 paper

Whether a trained reward model (or policy) performs well on new, unseen tasks or domains.

As used in Paper 15 — Training Language Models to Follow Instructions with Human Feedback →

Whether a trained reward model (or policy) performs well on new, unseen tasks or domains. RLHF generalizes better than pure SFT because the RM learns general principles of preference (clarity, relevance, safety) that transfer across domains.

Paper 15 — Training Language Models to Follow Instructions with Human Feedback →

Appears in papers