Data Annotation / Labeling
The process of having humans provide labels (e.g., preference comparisons) for training data.
The process of having humans provide labels (e.g., preference comparisons) for training data. A major cost in RLHF. This paper used ~90 human contractors to annotate 13k demonstrations and 33k comparisons. Scaling RLHF requires either more raters or AI-generated feedback (Constitutional AI).