Unfaithful Reasoning

Appears in 1 paper

When a language model generates intermediate reasoning steps that sound logical and plausible but don't actually reflect how the model arrived at its answer.

As used in Paper 14 — Chain-of-Thought Prompting Elicits Reasoning in Large Language Models →

When a language model generates intermediate reasoning steps that sound logical and plausible but don't actually reflect how the model arrived at its answer. For example, the model might explain a calculation step-by-step, but the final answer was actually produced through pattern-matching, not by executing that calculation. This is a key limitation of CoT — the reasoning is often confabulated rather than genuine.

Paper 14 — Chain-of-Thought Prompting Elicits Reasoning in Large Language Models →

Appears in papers