Upper Confidence Bound (UCB)
A formula that balances exploitation (choosing nodes with high average reward) and exploration (trying under-explored nodes).
A formula that balances exploitation (choosing nodes with high average reward) and exploration (trying under-explored nodes). UCB = average reward + C × exploration bonus. The exploration bonus decreases with more visits, so MCTS eventually settles on the best path.