Let’s walk through both strategies (best-of-N and sequential revision) on a single math problem.
Problem
“Solve for x: 2x² + 5x - 3 = 0”
Scenario Setup
- Model: 3.8B LLaMA model
- Single-attempt success rate on quadratic equations: p = 0.3 (30%)
- Available test-time tokens: 5,000 tokens (about 5 typical math solutions)
Strategy 1: Best-of-N
Allocation: Generate 5 independent solutions.
Attempt 1 (Model sampling 1):
Let me use the quadratic formula: x = (-b ± √(b² - 4ac)) / 2a
For 2x² + 5x - 3 = 0:
a = 2, b = 5, c = -3
Discriminant: 5² - 4(2)(-3) = 25 + 24 = 49
√49 = 7
x = (-5 ± 7) / 4
x = 2/4 = 0.5 OR x = -12/4 = -3
Answer: x = 0.5 or x = -3
Verification: Plug x = 0.5 into 2x² + 5x - 3: 2(0.5)² + 5(0.5) - 3 = 2(0.25) + 2.5 - 3 = 0.5 + 2.5 - 3 = 0 ✓
Plug x = -3 into 2x² + 5x - 3: 2(-3)² + 5(-3) - 3 = 2(9) - 15 - 3 = 18 - 15 - 3 = 0 ✓
Score: 1.0 (correct)
Attempt 2 (Model sampling 2):
I'll try to factor: 2x² + 5x - 3 = (2x - 1)(x + 3) = 0
Wait, let me verify: (2x - 1)(x + 3) = 2x² + 6x - x - 3 = 2x² + 5x - 3 ✓
So 2x - 1 = 0 → x = 1/2
Or x + 3 = 0 → x = -3
Answer: x = 1/2 or x = -3
Verification: Both solutions check out.
Score: 1.0 (correct)
Attempt 3 (Model sampling 3):
Using quadratic formula:
x = (-5 ± √(25 - 4·2·(-3))) / (2·2)
x = (-5 ± √(25 + 24)) / 4
x = (-5 ± √49) / 4
x = (-5 ± 7) / 4
x₁ = 2/4 = 1/2, x₂ = -12/4 = -3
Answer: x = 1/2 or x = -3
Score: 1.0 (correct)
Attempt 4 (Model sampling 4):
Let me rearrange: 2x² + 5x - 3 = 0
I'll use completing the square... actually this is hard.
Using quadratic formula: x = -5 ± √(25 + 24) / 4 = -5 ± 7 / 4
x = 2/4 or x = -12/4
x = 0.5 or x = -3
Wait, I think I made an error somewhere. Let me double-check...
Actually those are right.
Answer: x = 0.5 or x = -3
Score: 1.0 (correct, despite the uncertainty)
Attempt 5 (Model sampling 5):
2x² + 5x - 3 = 0
Let me think... maybe x = 1?
2(1)² + 5(1) - 3 = 2 + 5 - 3 = 4 ≠ 0
Try x = 0?
0 + 0 - 3 = -3 ≠ 0
Let me use the quadratic formula...
Actually, I'll just guess: x = 2 and x = -3
Let me check x = 2: 2(4) + 5(2) - 3 = 8 + 10 - 3 = 15 ≠ 0
So that's wrong.
Answer: I'm not sure, my guesses didn't work.
Score: 0.0 (incorrect/incomplete)
Best-of-N Decision:
The verifier sees 5 solutions:
- x = 0.5, -3 (Correct) ✓
- x = 0.5, -3 (Correct) ✓
- x = 0.5, -3 (Correct) ✓
- x = 0.5, -3 (Correct) ✓
- Incorrect/unsure (Wrong) ✗
The verifier checks which solutions satisfy the original equation:
- Solutions 1–4 all pass verification
- Solution 5 fails verification
The verifier picks solution 1 (or any of 1–4, they’re the same).
Final Answer: x = 0.5 or x = -3
Success: Yes ✓
Coverage: With 5 attempts, 4 correct, P(success) = 1 (at least one is correct)
Strategy 2: Sequential Revision
Allocation: Generate one solution, then iterate 4 times (critique + revise).
Round 1 (Initial Solution):
Question: Solve 2x² + 5x - 3 = 0
I'll use the quadratic formula: x = (-b ± √(b² - 4ac)) / 2a
a = 2, b = 5, c = -3
Discriminant = 25 + 24 = 49
√49 = 7
x = (-5 + 7) / 4 = 2/4 = 0.5
x = (-5 - 7) / 4 = -12/4 = -3
Final Answer: x = 0.5 or x = -3
Verifier Check: Both solutions are correct ✓
Critique: “Your solution is correct. You properly identified a = 2, b = 5, c = -3, calculated the discriminant correctly, and applied the quadratic formula properly. No revisions needed.”
Round 2 (Revision):
The model reads the critique:
The verifier says my solution is already correct. Let me double-check one more time:
- Check x = 0.5: 2(0.5)² + 5(0.5) - 3 = 0.5 + 2.5 - 3 = 0 ✓
- Check x = -3: 2(-3)² + 5(-3) - 3 = 18 - 15 - 3 = 0 ✓
Yes, both are correct.
Solution after round 2: x = 0.5 or x = -3 (unchanged, already optimal)
Rounds 3–4:
Since the solution is already perfect (both roots are correct), revisions don’t improve it further. The model converges to the correct answer by round 2.
Success: Yes ✓
Coverage: After 2 rounds, P(success) = 1.0 (found correct answer)
Comparison
| Metric | Best-of-N | Sequential Revision |
|---|---|---|
| Number of forward passes | 5 | 4 (1 initial + 3 more) |
| Tokens used | ~5,000 | ~4,000 |
| Final success | Yes (1.0) | Yes (1.0) |
| Confidence | High (4/5 attempts correct) | High (critique confirmed correctness) |
For this problem, both strategies succeed. But the sequential revision is slightly more efficient (4 rounds vs. 5 attempts).
Hard Problem Comparison: Geometry
If the problem were harder (e.g., a complex geometry problem where p = 0.2 per attempt):
Best-of-N with 5 attempts:
- P(at least one correct) = 1 - (0.8)^5 = 1 - 0.328 = 0.672 (67.2%)
Sequential Revision with 5 rounds, Δp = 0.08:
- Round 1: p = 0.2
- Round 2: p = 0.28
- Round 3: p = 0.36
- Round 4: p = 0.44
- Round 5: p = 0.52 (52%)
Best-of-N wins (67% vs 52%), so for easier problems (relative to the difficulty), best-of-N is better.
But if we had more tokens (say, 10 rounds for sequential revision):
- Round 10: p = 0.2 + 9 × 0.08 = 0.92 (92%)
Sequential revision now wins (92% vs 67%).
The paper’s insight: The optimal strategy depends on task difficulty and available compute. Hard problems + deep thinking beats easy problems + fast sampling.