RL-CAI (Reinforcement Learning Constitutional AI)

Appears in 1 paper

The second stage of Constitutional AI.

As used in Paper 22 — Constitutional AI: Harmlessness from AI Feedback →

The second stage of Constitutional AI. Generate response pairs, use the model to judge which one better follows the constitution, train a reward model on these preferences, and optimize with PPO (Paper 15).

Paper 22 — Constitutional AI: Harmlessness from AI Feedback →

Appears in papers