Helpful, Harmless, Honest (HHH)

Appears in 1 paper

The alignment criteria used to train InstructGPT: helpful (answers user queries well), harmless (doesn't enable or encourage harmful acts), honest (doesn't hallucinate or mislead).

As used in Paper 15 — Training Language Models to Follow Instructions with Human Feedback →

The alignment criteria used to train InstructGPT: helpful (answers user queries well), harmless (doesn't enable or encourage harmful acts), honest (doesn't hallucinate or mislead). Approximate values that guide human raters in providing preference labels.