Outcome Reward Model (ORM)
A model that scores only the final output (right or wrong), without evaluating intermediate steps.
A model that scores only the final output (right or wrong), without evaluating intermediate steps. Less informative than a PRM, but simpler to train.
A model that scores only the final output (right or wrong), not intermediate steps. Less informative than PRM but simpler to train.