Entropy Regularization

Appears in 1 paper

A term in RL that encourages exploration by rewarding policy entropy (randomness).

As used in Paper 15 — Training Language Models to Follow Instructions with Human Feedback →

A term in RL that encourages exploration by rewarding policy entropy (randomness). High entropy = exploring many actions; low entropy = committing to a few. RLHF uses entropy implicitly through the KL penalty, which discourages collapsing to a deterministic policy.