AVISE: an open‑source framework that automates tests for AI security and jailbreaks
This paper introduces AVISE (AI Vulnerability Identification and Security Evaluation), a modular, open‑source framework meant to help researchers and practitioners find security weaknesses in AI systems. The authors built AVISE in Python and arranged it into layers for orchestration and interaction so it can run repeatable tests against models and systems. The main goal is to make security testing more systematic, repeatable and easier to extend as AI models change.
As a demonstration, the team used AVISE to automate a multi‑turn jailbreak attack. They extended a theory‑of‑mind style Red Queen attack into an Adversarial Language Model (ALM) augmented attack that can probe language models across several turns of conversation. They packaged 25 jailbreak test cases into a Security Evaluation Test (SET). An Evaluation Language Model (ELM) judges whether each test case succeeded in “jailbreaking” the target model — that is, producing an output that bypasses the model’s intended safety or instruction limits. On the authors’ test set the ELM scored 92% accuracy, an F1 score of 0.91, and a Matthews correlation coefficient of 0.83.
At a high level AVISE runs a chosen SET multiple times against a target system, collects the model outputs, and uses the ELM and other logic to decide whether a vulnerability was exposed. The framework supports black‑box testing (no internal access), and can also be used for grey‑box or white‑box tests when source code or internal details are available. The authors also emphasize running tests repeatedly because language models are stochastic — their outputs can vary from run to run — so a single trial can be misleading.
Why this matters: language models and other AI systems are being deployed in sensitive settings, and known attack methods such as prompt injection and jailbreaks can cause models to ignore safety instructions. Governments and standards bodies now recommend adversarial testing or red teaming as part of safe AI development. By providing an extensible, automated way to run tests and aggregate results, AVISE aims to make security evaluation more rigorous and reproducible. The authors used their SET to test nine recently released language models of different sizes and found all were vulnerable to the augmented Red Queen attack to varying degrees.