Researchers propose a “schema-gated” way to let chatty AIs run reproducible science workflows
This paper looks at how large language models (LLMs) can turn a researcher’s plain-language request into running code and tools, while still
This paper looks at how large language models (LLMs) can turn a researcher’s plain-language request into running code and tools, while still meeting scientific needs for repeatability and traceability. The authors argue that conversational flexibility—talking naturally to an AI—often clashes with the need for deterministic execution, meaning the same inputs should give the same, auditable results. They propose a design called schema-gated orchestration to separate what the AI can say from what actually runs.
To ground their idea, the team talked to people who run research and development in industry. They did 20 interview sessions with 18 experts across 10 R&D stakeholders. From those interviews they found two dominant requirements: execution determinism (stable, repeatable runs that can be audited) and conversational flexibility (the ability to explore and iterate using natural language). System integration and workflow automation were the most common practical concerns raised by interviewees.
The authors also surveyed 20 representative systems across a range of architectures. They scored each system on two axes—execution determinism and conversational flexibility—using a multi-model protocol. That protocol ran 15 independent scoring sessions across three different LLM families and showed strong agreement between models (Krippendorff’s alpha = 0.80 for determinism and 0.98 for flexibility). The review found a clear trade-off: no system combined high determinism with high conversational flexibility. The authors call this a Pareto front.
Schema-gated orchestration is their proposed way to resolve the tension. A schema here means a machine-checkable specification of the full action to be run, including dependencies across steps. Under schema-gating, nothing executes until the proposed plan validates against that schema. The paper distills three practical rules to follow: clarify the user’s intent before running anything; keep planning and acting constrained so each automated step is checked; and gate tool use at the workflow level rather than letting an LLM directly invoke arbitrary tools.