OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions Paper • 2602.05843 • Published 9 days ago • 57
Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training Paper • 2602.05940 • Published 9 days ago • 18
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows Paper • 2510.24411 • Published Oct 28, 2025 • 72