RL-CAI (Reinforcement Learning Constitutional AI)
The second stage of Constitutional AI.
The second stage of Constitutional AI. Generate response pairs, use the model to judge which one better follows the constitution, train a reward model on these preferences, and optimize with PPO (Paper 15).