Artificial Intelligence

April 24, 2026

AVISE: an open framework that automates finding jailbreaks in language models

Researchers introduce AVISE (AI Vulnerability Identification and Security Evaluation), a modular open-source framework to find security prob

EN

2 min read

Artificial Intelligence

April 23, 2026

AVISE: an open‑source framework that automates tests for AI security and jailbreaks

This paper introduces AVISE (AI Vulnerability Identification and Security Evaluation), a modular, open‑source framework meant to help resear

EN

2 min read

Artificial Intelligence

April 21, 2026

LLM-based AI traders copy human trading biases — and prompts can dial market bubbles up or down

Researchers simulated a simple exchange populated entirely by autonomous large language model (LLM) agents to study how AI forms price expec

EN

2 min read

Artificial Intelligence

April 21, 2026

MathNet: a 30K+ multilingual Olympiad dataset and benchmark for math reasoning and math-aware search

Researchers introduce MathNet, a large collection of competition-level math problems and a set of tests designed to push how well AI can bot

EN

2 min read

Artificial Intelligence

April 14, 2026

Meerkat: a new way to find safety failures that only appear across many AI traces

AI safety problems sometimes hide across many short interactions. A single conversation or log file can look harmless, but when a small set

EN

2 min read

Artificial Intelligence

April 10, 2026

New training method teaches multimodal agents when not to call tools — Metis cuts tool calls from 98% to 2%

This paper tackles a common failure in agentic multimodal models: they call external tools too often, even when an answer could be found in

EN

2 min read

Artificial Intelligence

March 27, 2026

Foundation models match overall accuracy for traumatic bowel injury but give many false alarms when other organ injuries are present

This paper checks how large, pre-trained medical “foundation” models behave when asked to find traumatic bowel injury on CT scans. Foundatio

EN

2 min read

Artificial Intelligence

March 27, 2026

Study finds large language models show systematic human-like biases in preferences but more rational beliefs

What happens when large language models (LLMs) face economic choices? This paper tests whether LLMs behave like humans in decisions about pr

EN

2 min read

Artificial Intelligence

March 27, 2026

LiveMedBench: a weekly, contamination‑free medical test set that scores LLMs with automated rubrics

This paper introduces LiveMedBench, a new benchmark designed to test Large Language Models (LLMs) on real clinical problems while avoiding c

EN

2 min read

arXiv News

Today's Briefing

Study: AI agents trade in prediction markets but struggle to learn from others when tasks get complex

Latest Research

AVISE: an open framework that automates finding jailbreaks in language models

AVISE: an open‑source framework that automates tests for AI security and jailbreaks

LLM-based AI traders copy human trading biases — and prompts can dial market bubbles up or down

MathNet: a 30K+ multilingual Olympiad dataset and benchmark for math reasoning and math-aware search

Meerkat: a new way to find safety failures that only appear across many AI traces

New training method teaches multimodal agents when not to call tools — Metis cuts tool calls from 98% to 2%

Foundation models match overall accuracy for traumatic bowel injury but give many false alarms when other organ injuries are present

Study finds large language models show systematic human-like biases in preferences but more rational beliefs

LiveMedBench: a weekly, contamination‑free medical test set that scores LLMs with automated rubrics