CHIA: an open-source system to run AI-driven hardware and software co-design loops
This paper introduces CHIA, an open-source framework that helps researchers build, run, and study AI-driven design workflows for hardware and software together. The goal is to make it easy to express complex design procedures that mix traditional tools (like simulators and chip-build systems) with agentic artificial intelligence (AI that can take multi-step actions). CHIA treats the design flow itself as an object of study, not just the chips or programs being produced.
A CHIA workflow is written as a CHIA loop: a directed cyclic graph whose nodes run standard tools, simulators, build systems, AI models, evolutionary agents, and more. The CHIA library already includes node implementations for many popular tools, such as Chipyard, gem5, ChampSim, FireSim, Hammer (which links to several commercial ASIC computer-aided design tools), Vivado, AlphaEvolve, and AdaEvolve. CHIA also provides built-in features needed for careful experiments: isolation between AI models and hardware tools, profiling, fault-tolerant execution, and the ability to run reliably at scale across hundreds of heterogeneous machines (CPUs, field-programmable gate arrays or FPGAs, graphics processing units or GPUs, and cloud or on-premises servers).
To show what CHIA can do, the authors present five example CHIA loops. These include automatic alignment between register-transfer-level (RTL) designs and gem5 simulators; using large language models (LLMs) to help implement microarchitecture features in RTL; agent-driven optimization of a design’s critical timing path; evolutionary search to discover new microarchitectures; and an agent that triaged and fixed GitHub issues for the CIRCT compiler. The paper gives concrete results from these examples. For instance, an agentic flow improved a gem5 core model’s accuracy from about 40% to 97.2% relative to ground-truth RTL simulation over 10.5 days on 36 benchmarks. Another agent-implemented RISC‑V instruction-set extension produced 5.6% and 3.8% speedups on parts of the SPEC CPU2006 reference suite, achieved a 10× speedup on OpenSSL, and did so with under 5% area impact and no frequency change. A critical-path RTL rewrite raised frequency by 2.03× with only a 3.28% drop in instructions-per-cycle, yielding a net 1.96× performance gain. The RTL changes were validated by running the full 25+ trillion instruction SPEC06 reference workload in a RISC‑V system-on-chip.