arXiv News

First-hand information for everyone

Legal

Privacy PolicyTerms of Use

© 2026 arXiv News

arXiv News

EnglishJapanese

Switch language

EnglishJapanese
Loading account…

First-hand information for everyone

Latest
Study: AI agents trade in prediction markets but struggle to learn from others when tasks get complexAVISE: an open framework that automates finding jailbreaks in language modelsAVISE: an open‑source framework that automates tests for AI security and jailbreaksLLM-based AI traders copy human trading biases — and prompts can dial market bubbles up or downMathNet: a 30K+ multilingual Olympiad dataset and benchmark for math reasoning and math-aware searchMeerkat: a new way to find safety failures that only appear across many AI tracesNew training method teaches multimodal agents when not to call tools — Metis cuts tool calls from 98% to 2%Foundation models match overall accuracy for traumatic bowel injury but give many false alarms when other organ injuries are presentStudy finds large language models show systematic human-like biases in preferences but more rational beliefsLiveMedBench: a weekly, contamination‑free medical test set that scores LLMs with automated rubricsStudy: AI agents trade in prediction markets but struggle to learn from others when tasks get complexAVISE: an open framework that automates finding jailbreaks in language modelsAVISE: an open‑source framework that automates tests for AI security and jailbreaksLLM-based AI traders copy human trading biases — and prompts can dial market bubbles up or downMathNet: a 30K+ multilingual Olympiad dataset and benchmark for math reasoning and math-aware searchMeerkat: a new way to find safety failures that only appear across many AI tracesNew training method teaches multimodal agents when not to call tools — Metis cuts tool calls from 98% to 2%Foundation models match overall accuracy for traumatic bowel injury but give many false alarms when other organ injuries are presentStudy finds large language models show systematic human-like biases in preferences but more rational beliefsLiveMedBench: a weekly, contamination‑free medical test set that scores LLMs with automated rubrics

Today's Briefing

Friday, May 1, 2026
AllArtificial IntelligenceMachine LearningNatural Language ProcessingComputer VisionRoboticsCryptographyPhysicsMathematics
Artificial IntelligenceFeatured briefing

Study: AI agents trade in prediction markets but struggle to learn from others when tasks get complex

This paper tests whether AI agents built from large language models can pool private information by trading in a simple prediction market. T

April 24, 2026EN2 min read
Read full article

Latest Research

Artificial Intelligence
April 24, 2026

AVISE: an open framework that automates finding jailbreaks in language models

Researchers introduce AVISE (AI Vulnerability Identification and Security Evaluation), a modular open-source framework to find security prob

EN
2 min read
Artificial Intelligence
April 23, 2026

AVISE: an open‑source framework that automates tests for AI security and jailbreaks

This paper introduces AVISE (AI Vulnerability Identification and Security Evaluation), a modular, open‑source framework meant to help resear

EN
2 min read
Artificial Intelligence
April 21, 2026

LLM-based AI traders copy human trading biases — and prompts can dial market bubbles up or down

Researchers simulated a simple exchange populated entirely by autonomous large language model (LLM) agents to study how AI forms price expec

EN
2 min read
Advertisement
Artificial Intelligence
April 21, 2026

MathNet: a 30K+ multilingual Olympiad dataset and benchmark for math reasoning and math-aware search

Researchers introduce MathNet, a large collection of competition-level math problems and a set of tests designed to push how well AI can bot

EN
2 min read
Artificial Intelligence
April 14, 2026

Meerkat: a new way to find safety failures that only appear across many AI traces

AI safety problems sometimes hide across many short interactions. A single conversation or log file can look harmless, but when a small set

EN
2 min read
Artificial Intelligence
April 10, 2026

New training method teaches multimodal agents when not to call tools — Metis cuts tool calls from 98% to 2%

This paper tackles a common failure in agentic multimodal models: they call external tools too often, even when an answer could be found in

EN
2 min read
Artificial Intelligence
March 27, 2026

Foundation models match overall accuracy for traumatic bowel injury but give many false alarms when other organ injuries are present

This paper checks how large, pre-trained medical “foundation” models behave when asked to find traumatic bowel injury on CT scans. Foundatio

EN
2 min read
Artificial Intelligence
March 27, 2026

Study finds large language models show systematic human-like biases in preferences but more rational beliefs

What happens when large language models (LLMs) face economic choices? This paper tests whether LLMs behave like humans in decisions about pr

EN
2 min read
Artificial Intelligence
March 27, 2026

LiveMedBench: a weekly, contamination‑free medical test set that scores LLMs with automated rubrics

This paper introduces LiveMedBench, a new benchmark designed to test Large Language Models (LLMs) on real clinical problems while avoiding c

EN
2 min read
Next page of briefings