Exploration-Exploitation Trade-off

Appears in 1 paper

The fundamental challenge in search and learning: should you exploit what you've learned (focus on high-reward nodes) or explore new options (try under-explored nodes)?

As used in Paper 24 — rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking →

The fundamental challenge in search and learning: should you exploit what you've learned (focus on high-reward nodes) or explore new options (try under-explored nodes)? UCB balances both.

Paper 24 — rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking →

Appears in papers