Using physics-based simulations to co-design ultra-efficient AI hardware
This paper argues that simulations based on first principles — that is, on fundamental physics without fitted parameters — can guide the joi
This paper argues that simulations based on first principles — that is, on fundamental physics without fitted parameters — can guide the joint design of materials, devices, interconnects, circuits and architectures to sharply cut the energy cost of large AI models. The authors say that modern generative-AI workloads spend most of their compute and energy on matrix-vector and matrix-matrix multiplications (MatMul). If engineers can find device and interconnect operating points suited to those dense linear algebra operations, energy per token could fall by large factors compared with today’s digital CMOS (complementary metal–oxide–semiconductor) accelerators.
The authors first explain why MatMul matters. In transformer-style models like GPT, layers dominated by MatMul — for example in multi-head self-attention and the feed-forward networks — require far more operations than nonlinear activations or normalization. The paper cites representative GPT-3 numbers (batch size B = 512, sequence length S = 2048, model dimension d_model = 12,288, feed-forward size d_ff = 49,152, and 96 layers) to show how these linear algebra steps dominate cost. For high-throughput training and large models, the best practical platforms remain GPU (graphics processing unit), TPU (tensor processing unit) and similar accelerators, so any new device must match that throughput while using much less energy per operation.
Their central proposal is co-design guided by predictive first-principles device and interconnect simulations. By “predictive” the authors mean simulators that do not rely on empirical fitting but compute electrical behavior from materials and device physics. Those simulators can map design knobs — geometry, materials, doping, and layout — to circuit-level metrics such as delay, energy, parasitic losses, and variability. Feeding those metrics into workload-level models (for example, MatMul energy and data-movement cost for transformer layers) closes the loop from nanoscale physics to system performance.