First-hand information for everyone
This paper presents VLMaterial, a system that fuses camera images and millimeter-wave (mmWave) radar with a vision-language model to identif
This paper presents ThinkJEPA, a method that combines two ways of understanding video to predict future states for tasks like hand-manipulat