Study: showing model “reasoning” makes people trust answers even when wrong, but contrastive explanations help
This paper asks whether the natural-language explanations that large language models (LLMs) and large reasoning models (LRMs) make people trust their answers for the right reasons. The authors note that these models give no guarantees of correctness, and that the so-called reasoning traces (often called “chain of thought”) are not necessarily faithful records of how the model computed an answer. They develop a user-focused way to measure whether different explanation styles help people tell correct from incorrect model outputs or simply persuade them to accept the model.
To test this, the researchers ran a between-subjects study that simulated a realistic but constrained situation: participants could not independently check the model’s answer. They used hard math, physics, and chemistry problems from the JEE-Bench dataset, which come from a competitive exam and are difficult for non-experts. The participants were high-school graduates recruited on Prolific who had basic subject knowledge but not the expertise to solve those problems reliably. For each problem, people saw the model’s answer together with one of several explanation types: a full reasoning trace (chain of thought), a shorter summary or post-hoc explanation, or a contrastive “dual” explanation that listed arguments for and against the model’s answer.
The main finding is that reasoning traces and post-hoc explanations are persuasive but not reliably informative. In other words, showing those explanations made people more likely to accept the model’s answers, whether the answers were correct or wrong. This increases the rate of false trust: users accepted incorrect outputs more often when those explanations were shown. By contrast, the contrastive dual explanations were the only condition that genuinely improved users’ ability to tell correct from incorrect outputs. Dual explanations produced a more balanced outcome, helping participants detect both right and wrong answers better than the other explanatory styles.