Unfaithful Reasoning
When a language model generates intermediate reasoning steps that sound logical and plausible but don't actually reflect how the model arrived at its answer.
When a language model generates intermediate reasoning steps that sound logical and plausible but don't actually reflect how the model arrived at its answer. For example, the model might explain a calculation step-by-step, but the final answer was actually produced through pattern-matching, not by executing that calculation. This is a key limitation of CoT — the reasoning is often confabulated rather than genuine.