REdit: reshaping neural circuits to edit specific reasoning errors in large language models
Large language models can reason in impressive ways. But they also make systematic reasoning mistakes that are hard to fix with broad retrai
Large language models can reason in impressive ways. But they also make systematic reasoning mistakes that are hard to fix with broad retraining. This paper introduces “reasoning editing,” a way to change one specific reasoning pattern inside a model while leaving other reasoning abilities intact. The authors present REdit, a method that first reshapes the model’s internal circuits and then applies a focused parameter edit to correct the target inference rule.
The key insight behind REdit is the Circuit‑Interference Law. The authors report that when an edit for one reasoning pattern affects another pattern, the amount of interference is roughly proportional to how much the two patterns share the same internal neural circuitry. To reduce harmful side effects, REdit actively disentangles overlapping circuits for different reasoning patterns before making the actual correction. The reshaping step is called Contrastive Circuit Reshaping and aims to make the target circuit more distinct from others.
REdit has three main components. Contrastive Circuit Reshaping separates overlapping pathways so edits are more local. Meta‑Contrastive Learning helps the reshaped circuits transfer to new but related reasoning patterns, improving generalization beyond the specific examples seen in reshaping. Dual‑Level Protection protects existing abilities during reshaping by constraining update directions with a soft null‑space projection and by regularizing task‑level prediction distributions. After reshaping, the team uses a commonly used parameter‑efficient editing technique to finalize the correction.
To test the idea, the authors focus on propositional logic, a setting where reasoning patterns can be defined and measured precisely. They run extensive experiments with the Qwen‑2.5‑3B model on propositional logic tasks at three difficulty levels. The paper reports that REdit consistently achieves better generality (the edited rule holds across different examples of the same pattern) and better locality (other correct reasoning behaviors are preserved) than strong baseline editing methods. The authors also give additional validation on math problems to show the approach has broader potential.