Helpful, Harmless, Honest (HHH)
The alignment criteria used to train InstructGPT: helpful (answers user queries well), harmless (doesn't enable or encourage harmful acts), honest (doesn't hallucinate or mislead).
The alignment criteria used to train InstructGPT: helpful (answers user queries well), harmless (doesn't enable or encourage harmful acts), honest (doesn't hallucinate or mislead). Approximate values that guide human raters in providing preference labels.