Study finds a hidden model feature can nudge LLMs toward Bitcoin, but only so far
Researchers asked whether large language models (LLMs) have built‑in preferences for particular financial assets and whether those preferences can be found and changed inside the model. They developed a three‑level audit and used Bitcoin as a case study. The audit shows that (1) model outputs about Bitcoin depend strongly on how the question is framed, (2) an internal representation tied to Bitcoin can be located and manipulated in one open model, and (3) changing that internal signal moves the model’s portfolio choices by a measurable but limited amount.
At the first level the team ran a behavioral audit across eight frontier LLMs. Models were asked to rank eight money‑like instruments — US dollar cash, bank deposit, US Treasury bill, gold, Bitcoin, Ethereum, an S&P 500 index fund, and residential real estate — under eight different frames such as “reliable money,” “crisis,” and “autonomous agent.” The experiment included label controls, an attribute‑swap test, and a made‑up label “Bitstone” to check whether rankings came from names or from the instruments’ described properties. Across 733 parsed trials, Bitcoin ranked around 5th of 8 for the “reliable money” frame but moved near the top in crisis and autonomous‑agent frames. The attribute‑swap and synthetic‑label tests showed that the rankings follow the described attributes, not just the name “Bitcoin.”
At the second level the authors opened one model’s internals (Gemma 3) and searched thousands of learned sparse‑autoencoder (SAE) features. An SAE feature is a single learned signal inside the model that can light up for particular concepts. They found a dominant Bitcoin‑selective SAE feature. When they amplified that feature — that is, increased its internal activation — the model shifted toward Bitcoin. When they suppressed it, the model shifted away. These steering effects were present even when the word “Bitcoin” did not appear in the prompt, and the strongest steering was observed at the 27B parameter scale of Gemma 3.