AVISE: an open framework that automates finding jailbreaks in language models
Researchers introduce AVISE (AI Vulnerability Identification and Security Evaluation), a modular open-source framework to find security problems in AI systems. As a demonstration, they built an automated Security Evaluation Test (SET) that looks for “jailbreaks” — crafted inputs that make a language model ignore its safety instructions. The SET has 25 test cases and an Evaluation Language Model (ELM) that judges whether each case succeeded. According to the paper excerpt, the ELM reached 92% accuracy, an F1 score of 0.91, and a Matthews correlation coefficient of 0.83 when deciding jailbreak outcomes.
To build the framework the authors implemented AVISE in Python and structured it with an Orchestration Layer and an Interaction Layer. AVISE is designed to be modular so researchers or companies can create tests that run with different levels of access: black-box (only interacting with the model like a normal user), grey-box, or white-box (with internal access). As a concrete example, the team extended a prior multi-turn attack called Red Queen into an Adversarial Language Model (ALM)–augmented attack and automated that as a SET within AVISE.
At a high level the Red Queen style of attack works over multiple turns of conversation. Instead of a single malicious input, the attacker shapes a sequence of messages so the target model gradually accepts or follows harmful instructions. The ALM augmentation means an adversarial language model helps craft those messages automatically. The ELM then evaluates each test run and labels whether the target was successfully jailbroken. The authors used this SET to test nine recently released language models of different sizes and report that all nine were vulnerable to the augmented Red Queen attack to varying degrees.
The work matters because AI systems are being used in sensitive areas and regulators and policymakers are increasingly asking for adversarial testing. The paper cites new rules such as the EU AI Act and a U.S. executive order that consider red teaming or adversarial testing a necessary part of AI deployment. AVISE aims to give researchers and practitioners an extensible, reproducible way to run automated tests and to handle a practical issue of AI evaluations: models are stochastic, so the framework supports repeated test runs and statistical aggregation rather than treating a single run as definitive.