Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models Paper β’ 2603.18002 β’ Published Mar 18 β’ 13
From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors Paper β’ 2602.21778 β’ Published Feb 25 β’ 14
CoPE-VideoLM: Codec Primitives For Efficient Video Language Models Paper β’ 2602.13191 β’ Published Feb 13 β’ 31
CoPE-VideoLM: Codec Primitives For Efficient Video Language Models Paper β’ 2602.13191 β’ Published Feb 13 β’ 31
Running Featured 562 Vision Arena (Testing VLMs side-by-side) πΌ 562 Explore AI-powered visual tasks in Vision Arena
GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer Paper β’ 2510.16136 β’ Published Oct 17, 2025 β’ 5
GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer Paper β’ 2510.16136 β’ Published Oct 17, 2025 β’ 5
GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer Paper β’ 2510.16136 β’ Published Oct 17, 2025 β’ 5 β’ 2
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images Paper β’ 2504.08727 β’ Published Apr 11, 2025 β’ 12
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks Paper β’ 2502.08235 β’ Published Feb 12, 2025 β’ 59
CrossOver: 3D Scene Cross-Modal Alignment Paper β’ 2502.15011 β’ Published Feb 20, 2025 β’ 2 β’ 3