Large language models used to measure and tune financial behavior like loss aversion and herding
This paper explores whether large language models (LLMs) can act as measurement tools for psychological traits that matter in finance. The authors treat LLMs not as copies of people but as instruments that can be nudged with different profile prompts to reveal or change parameters such as loss aversion (stronger dislike of losses than preference for gains), herding (copying others), and extrapolation (expecting trends to continue). They ask whether these induced parameters are stable, meaningful, and comparable to human benchmarks.
The team ran controlled experiments across eight canonical behavioral biases using four LLMs (GPT‑4o, GPT‑4o‑mini, Claude‑3.5‑Haiku, and Gemini‑2.5‑Pro). They evaluated about 19,200 agent–scenario pairs in the main analyses (24,000 including one model that was later excluded for parsing problems). All prompts used synthetic financial scenarios that the authors say did not appear in the models’ training data. Their method treats different prompt profiles as experimental treatments that should move latent parameters in predictable ways.
At baseline, the LLMs showed a systematic bias toward more rational-seeming choices than humans. Relative to human benchmarks, the models had weaker loss aversion and herding, and almost no disposition effect (the human tendency to sell winners too early and hold on to losers). For example, uncalibrated loss aversion λ fell between about 1.12 and 1.90 in the models, versus a human benchmark near 2.25 reported by the authors.
When the researchers applied profile-based calibration—carefully designed prompt profiles that encourage a particular behavioral stance—the models shifted substantially and coherently. Calibrated loss-averse profiles raised λ to about 3.00. Herding-prone profiles moved herding rates up to about 90 percent. Extrapolative profiles raised a momentum coefficient to about 0.88, and loss-averse profiles increased anchoring correlation to about 0.67. To test whether these calibrated numbers matter economically, the authors put the calibrated parameters into a simple agent-based asset pricing model. Calibrated extrapolation produced short-term momentum and long-term reversal patterns that match classic empirical findings, while baseline rational agents produced no momentum.