Logic in Computer ScienceEnglishPublished

An AI writes an SMT solver from scratch: a case study with no human-written code

March 14, 2026arXiv: 2603.06931v2

Researchers asked a large language model (LLM) coding agent to write a complete Satisfiability Modulo Theories (SMT) solver with no human-written code. The agent produced a DPLL(T)-style solver for the quantifier-free theory of equality with uninterpreted functions (QF_UF). The project includes a congruence closure algorithm for equality reasoning, preprocessing steps, integration with an external SAT solver, and the ability to emit Lean proofs for some unsatisfiable cases.

The team ran the whole development inside ClaudeCode using the Sonnet 4.6 model. The solver is written in C++20 and uses CaDiCaL as the SAT solver via the IPASIR-UP interface. For equality reasoning, the agent implemented the Nieuwenhuis–Oliveras congruence closure algorithm, a standard method that computes which terms must be equal under the current assumptions. The tool also supports the SMT-LIB input format and a get-model command so that produced models can be checked by a reference solver.

Development was iterative and required many prompts and fixes. Early versions missed support for Boolean connectives and had parsing and congruence-closure bugs. The agent at one point wrote a simple internal SAT solver before being guided to integrate CaDiCaL. The authors used automated testing ideas—random formula generation and differential testing against a reference solver—to find and fix errors. They also added a timeout wrapper after noticing background processes could run indefinitely.

Preprocessing mattered for performance. The agent implemented simplifications and a special preprocessing trick for so-called equational diamond benchmarks. These benchmarks are trivially unsatisfiable but difficult for lazy SMT solvers unless common equalities across disjuncts are pulled out first. The LLM devised a method that computes equality closures inside each disjunct and adds the equalities that appear in all branches, which made those problems run much faster.