Bradley-Terry Model

Appears in 2 papers

A probabilistic ranking model from statistics, used here to model human preferences.

As used in Paper 15 — Training Language Models to Follow Instructions with Human Feedback →

A probabilistic ranking model from statistics, used here to model human preferences. Assumes the probability that response A is preferred over B depends on the difference in their rewards: P(A preferred) = σ(r_A - r_B). Simple but effective for binary preference classification.

As used in Paper 22 — Constitutional AI: Harmlessness from AI Feedback →

A statistical model for preference prediction. Given a pair of items (y_w, y_l), it models the probability that y_w is preferred as P(y_w > y_l) = σ(r(y_w) - r(y_l)). This is the standard model for training reward models from preference data.