New study maps 193 security threats for multi‑agent AI and finds major gaps in existing frameworks
This paper looks at security risks that arise when many autonomous AI agents work together. These “multi‑agent systems” share memory, call e
This paper looks at security risks that arise when many autonomous AI agents work together. These “multi‑agent systems” share memory, call external tools, and talk with one another. That combination creates new kinds of vulnerabilities that are different from the threats people study for single AI models. The authors set out to list those risks and then measure how well current AI security frameworks cover them.
The team used a four‑phase process. First, they built a deep technical knowledge base about production multi‑agent architectures. That work spans 86 chapters in ten parts and thousands of pages, and covers components such as tool integration, shared memory, and multi‑agent communication. Second, they asked a generative AI system to model threats against that technical base, producing about 1,700 candidate issues. Those candidates were reviewed and refined by an NVIDIA‑certified expert and reduced to 193 distinct main threat items. Third, they turned each threat into a focused survey plan. Fourth, they began executing the survey and a temporal analysis to see how threats mature over time.
The 193 threats are grouped into nine categories: Agent–Tool Coupling, Data Leakage, Injection, Identity and Provenance, Memory Poisoning, Non‑Determinism, Trust Exploitation, Timing and Monitoring, and Workflow Architecture. The authors scored 16 existing security and governance frameworks against every threat using a simple three‑point scale. No framework achieved majority coverage of any single category. Two areas were especially under‑addressed: Non‑Determinism (average score 1.231) and Data Leakage (1.340). The OWASP Agentic Security Initiative had the highest overall coverage at 65.3% and led on design‑phase coverage. The CDAO Generative AI Responsible AI Toolkit led in development and operational coverage. The authors also found five threat items that none of the 16 frameworks covered.