arXiv News

Switch language

Loading account…

First-hand information for everyone

Latest

Tuna-2 drops pretrained vision encoders and learns directly from pixels for image understanding and generationLyra 2.0 makes long, explorable 3D worlds from a single photo by fixing two common failure modesLyra 2.0 makes large, explorable 3D scenes from one image by fixing two common video-generation failuresNew training method teaches multimodal agents when not to call tools — Metis cuts tool calls from 98% to 2%HaloProbe: a Bayesian probe that detects and reduces object hallucinations in image captions without changing the modelNew benchmark shows AI still struggles to write PhD‑level 3D computer‑vision codeFoundation models match overall accuracy for traumatic bowel injury but give many false alarms when other organ injuries are presentMuRF: Combine low- and high-resolution views at inference to improve vision foundation modelsThinkJEPA blends a dense video predictor with a vision–language “thinker” to forecast longer-range hand movementsNew 'TUSA' method teaches AI to read ultrasound by learning texturesTuna-2 drops pretrained vision encoders and learns directly from pixels for image understanding and generationLyra 2.0 makes long, explorable 3D worlds from a single photo by fixing two common failure modesLyra 2.0 makes large, explorable 3D scenes from one image by fixing two common video-generation failuresNew training method teaches multimodal agents when not to call tools — Metis cuts tool calls from 98% to 2%HaloProbe: a Bayesian probe that detects and reduces object hallucinations in image captions without changing the modelNew benchmark shows AI still struggles to write PhD‑level 3D computer‑vision codeFoundation models match overall accuracy for traumatic bowel injury but give many false alarms when other organ injuries are presentMuRF: Combine low- and high-resolution views at inference to improve vision foundation modelsThinkJEPA blends a dense video predictor with a vision–language “thinker” to forecast longer-range hand movementsNew 'TUSA' method teaches AI to read ultrasound by learning textures

Today's Briefing

Friday, May 1, 2026

All Artificial Intelligence Machine Learning Natural Language Processing Computer Vision Robotics Cryptography Physics Mathematics

Computer VisionFeatured briefing

Today's Briefing

Tuna-2 drops pretrained vision encoders and learns directly from pixels for image understanding and generation

Latest Research

Lyra 2.0 makes long, explorable 3D worlds from a single photo by fixing two common failure modes

Lyra 2.0 makes large, explorable 3D scenes from one image by fixing two common video-generation failures

New training method teaches multimodal agents when not to call tools — Metis cuts tool calls from 98% to 2%

HaloProbe: a Bayesian probe that detects and reduces object hallucinations in image captions without changing the model

New benchmark shows AI still struggles to write PhD‑level 3D computer‑vision code

Foundation models match overall accuracy for traumatic bowel injury but give many false alarms when other organ injuries are present

MuRF: Combine low- and high-resolution views at inference to improve vision foundation models

ThinkJEPA blends a dense video predictor with a vision–language “thinker” to forecast longer-range hand movements

New 'TUSA' method teaches AI to read ultrasound by learning textures