SpecOps: a fully automated system that tests real-world AI agents through their user interfaces | arXiv News