arXiv News

First-hand information for everyone

Legal

Privacy PolicyTerms of Use

© 2026 arXiv News

arXiv News

EnglishJapanese

Switch language

EnglishJapanese
Loading account…

First-hand information for everyone

Latest
Tuna-2 drops pretrained vision encoders and learns directly from pixels for image understanding and generationLyra 2.0 makes long, explorable 3D worlds from a single photo by fixing two common failure modesLyra 2.0 makes large, explorable 3D scenes from one image by fixing two common video-generation failuresNew training method teaches multimodal agents when not to call tools — Metis cuts tool calls from 98% to 2%HaloProbe: a Bayesian probe that detects and reduces object hallucinations in image captions without changing the modelNew benchmark shows AI still struggles to write PhD‑level 3D computer‑vision codeFoundation models match overall accuracy for traumatic bowel injury but give many false alarms when other organ injuries are presentMuRF: Combine low- and high-resolution views at inference to improve vision foundation modelsThinkJEPA blends a dense video predictor with a vision–language “thinker” to forecast longer-range hand movementsNew 'TUSA' method teaches AI to read ultrasound by learning texturesTuna-2 drops pretrained vision encoders and learns directly from pixels for image understanding and generationLyra 2.0 makes long, explorable 3D worlds from a single photo by fixing two common failure modesLyra 2.0 makes large, explorable 3D scenes from one image by fixing two common video-generation failuresNew training method teaches multimodal agents when not to call tools — Metis cuts tool calls from 98% to 2%HaloProbe: a Bayesian probe that detects and reduces object hallucinations in image captions without changing the modelNew benchmark shows AI still struggles to write PhD‑level 3D computer‑vision codeFoundation models match overall accuracy for traumatic bowel injury but give many false alarms when other organ injuries are presentMuRF: Combine low- and high-resolution views at inference to improve vision foundation modelsThinkJEPA blends a dense video predictor with a vision–language “thinker” to forecast longer-range hand movementsNew 'TUSA' method teaches AI to read ultrasound by learning textures

Today's Briefing

Friday, May 1, 2026
AllArtificial IntelligenceMachine LearningNatural Language ProcessingComputer VisionRoboticsCryptographyPhysicsMathematics
Computer VisionFeatured briefing

Tuna-2 drops pretrained vision encoders and learns directly from pixels for image understanding and generation

This paper presents Tuna-2, a multimodal AI model that works directly from raw pixels instead of relying on a separate, pretrained vision en

April 28, 2026EN2 min read
Read full article

Latest Research

Computer Vision
April 16, 2026

Lyra 2.0 makes long, explorable 3D worlds from a single photo by fixing two common failure modes

This paper describes Lyra 2.0, a system that starts from a single image and generates long, camera-controlled videos that can be lifted into

EN
2 min read
Computer Vision
April 15, 2026

Lyra 2.0 makes large, explorable 3D scenes from one image by fixing two common video-generation failures

Lyra 2.0 is a method that starts from a single photo and lets a user explore a large, synthetic 3D world. The system first generates a camer

EN
2 min read
Artificial Intelligence
April 10, 2026

New training method teaches multimodal agents when not to call tools — Metis cuts tool calls from 98% to 2%

This paper tackles a common failure in agentic multimodal models: they call external tools too often, even when an answer could be found in

EN
2 min read
Advertisement
Machine Learning
April 8, 2026

HaloProbe: a Bayesian probe that detects and reduces object hallucinations in image captions without changing the model

Large vision-language models can name objects that are not actually in an image. This paper studies that problem, which is called object hal

EN
2 min read
Computer Vision
April 1, 2026

New benchmark shows AI still struggles to write PhD‑level 3D computer‑vision code

Researchers introduce GeoCodeBench, a new benchmark designed to test whether large language models can write the kind of precise code used i

EN
2 min read
Artificial Intelligence
March 27, 2026

Foundation models match overall accuracy for traumatic bowel injury but give many false alarms when other organ injuries are present

This paper checks how large, pre-trained medical “foundation” models behave when asked to find traumatic bowel injury on CT scans. Foundatio

EN
2 min read
Computer Vision
March 27, 2026

MuRF: Combine low- and high-resolution views at inference to improve vision foundation models

This paper introduces MuRF, short for Multi-Resolution Fusion. The idea is simple. Instead of feeding a single resized image to a pre-traine

EN
2 min read
Artificial Intelligence
March 24, 2026

ThinkJEPA blends a dense video predictor with a vision–language “thinker” to forecast longer-range hand movements

This paper presents ThinkJEPA, a method that combines two ways of understanding video to predict future states for tasks like hand-manipulat

EN
2 min read
Computer Vision
March 20, 2026

New 'TUSA' method teaches AI to read ultrasound by learning textures

Ultrasound pictures look different from ordinary photos. They are made from echoes of sound and show characteristic gray-scale textures. A t

EN
2 min read
Next page of briefings