view article Article Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action nvidia • 7 days ago • 71
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments Paper • 2605.30280 • Published 11 days ago • 139
AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security Paper • 2605.29801 • Published 11 days ago • 142
ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration Paper • 2605.03042 • Published May 4 • 129
view article Article NEO-unify: Building Native Multimodal Unified Models End to End sensenova • Mar 5 • 164
PaperBanana: Automating Academic Illustration for AI Scientists Paper • 2601.23265 • Published Jan 30 • 228
V^{2}-SAM: Marrying SAM2 with Multi-Prompt Experts for Cross-View Object Correspondence Paper • 2511.20886 • Published Nov 25, 2025 • 1
Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer Paper • 2510.06590 • Published Oct 8, 2025 • 78
OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory Paper • 2512.07802 • Published Dec 8, 2025 • 46
PriorCLIP: Visual Prior Guided Vision-Language Model for Remote Sensing Image-Text Retrieval Paper • 2405.10160 • Published May 16, 2024 • 1
RSCC: A Large-Scale Remote Sensing Change Caption Dataset for Disaster Events Paper • 2509.01907 • Published Sep 2, 2025 • 2
SegEarth-OV: Towards Traning-Free Open-Vocabulary Segmentation for Remote Sensing Images Paper • 2410.01768 • Published Oct 2, 2024 • 4
PANORAMA: The Rise of Omnidirectional Vision in the Embodied AI Era Paper • 2509.12989 • Published Sep 16, 2025 • 28
EarthSynth: Generating Informative Earth Observation with Diffusion Models Paper • 2505.12108 • Published May 17, 2025 • 2
Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community Paper • 2408.09110 • Published Aug 17, 2024 • 2