UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs Paper • 2512.03383 • Published Dec 3, 2025 • 4
Synthesizing Agentic Data for Web Agents with Progressive Difficulty Enhancement Mechanisms Paper • 2510.13913 • Published Oct 15, 2025 • 3
EgoVLM: Policy Optimization for Egocentric Video Understanding Paper • 2506.03097 • Published Jun 3, 2025
Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math Paper • 2510.13744 • Published Oct 15, 2025 • 5
ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction? Paper • 2411.06469 • Published Nov 10, 2024 • 17
SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents Paper • 2509.06283 • Published Sep 8, 2025 • 17
Arch-Router: Aligning LLM Routing with Human Preferences Paper • 2506.16655 • Published Jun 19, 2025 • 17
LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation Paper • 2501.05414 • Published Jan 9, 2025 • 2
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 17
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 17
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models Paper • 2503.22879 • Published Mar 28, 2025 • 9
Quamba: A Post-Training Quantization Recipe for Selective State Space Models Paper • 2410.13229 • Published Oct 17, 2024 • 1
Efficient Low-rank Backpropagation for Vision Transformer Adaptation Paper • 2309.15275 • Published Sep 26, 2023 • 1
MobileTL: On-device Transfer Learning with Inverted Residual Blocks Paper • 2212.03246 • Published Dec 5, 2022 • 1
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates Paper • 2407.06249 • Published Jul 8, 2024
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows" Paper • 2410.03727 • Published Sep 30, 2024 • 2
Automating Human Tutor-Style Programming Feedback: Leveraging GPT-4 Tutor Model for Hint Generation and GPT-3.5 Student Model for Hint Validation Paper • 2310.03780 • Published Oct 5, 2023