25 79 256

Yinxu Pan

cppowboy

https://github.com/Cppowboy

AI & ML interests

RL for LLM, Code&Math Reasoning, Function Calling, Code Interpreter, Vision-Language Pretraining

Recent Activity

upvoted a paper 1 day ago

Evaluating Parameter Efficient Methods for RLVR

upvoted a paper 1 day ago

End-to-End Test-Time Training for Long Context

upvoted a paper 3 days ago

Nested Browser-Use Learning for Agentic Information Seeking

View all activity

Organizations

upvoted 2 papers 1 day ago

Evaluating Parameter Efficient Methods for RLVR

Paper • 2512.23165 • Published 4 days ago • 19

End-to-End Test-Time Training for Long Context

Paper • 2512.23675 • Published 4 days ago • 14

upvoted a paper 3 days ago

Nested Browser-Use Learning for Agentic Information Seeking

Paper • 2512.23647 • Published 4 days ago • 15

upvoted 2 papers 4 days ago

SWE-RM: Execution-free Feedback For Software Engineering Agents

Paper • 2512.21919 • Published 7 days ago • 8

MAI-UI Technical Report: Real-World Centric Foundation GUI Agents

Paper • 2512.22047 • Published 7 days ago • 25

upvoted 3 papers 8 days ago

SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios

Paper • 2512.18470 • Published 13 days ago • 9

NVIDIA Nemotron 3: Efficient and Open Intelligence

Paper • 2512.20856 • Published 9 days ago • 27

Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Paper • 2512.20848 • Published 9 days ago • 28

upvoted a paper 11 days ago

SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories

Paper • 2512.17419 • Published 14 days ago • 9

upvoted an article 18 days ago

Article

Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models

18 days ago

•

104

upvoted 2 papers 27 days ago

Qwen3-VL Technical Report

Paper • 2511.21631 • Published Nov 26, 2025 • 147

PretrainZero: Reinforcement Active Pretraining

Paper • 2512.03442 • Published about 1 month ago • 47

upvoted 2 papers about 1 month ago

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Paper • 2512.02556 • Published about 1 month ago • 243

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 93

upvoted a collection about 1 month ago

Olmo 3 Post-training

Collection

All artifacts for post-training Olmo 3. Datasets follow the model that resulted from training on them. • 32 items • Updated 10 days ago • 46

upvoted a paper about 1 month ago

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Paper • 2511.19399 • Published Nov 24, 2025 • 60

upvoted a paper about 2 months ago

Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning

Paper • 2508.03501 • Published Aug 5, 2025 • 59

upvoted 3 papers 3 months ago

ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning

Paper • 2510.12693 • Published Oct 14, 2025 • 27

Reinforcement Learning on Pre-Training Data

Paper • 2509.19249 • Published Sep 23, 2025 • 67

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe

Paper • 2509.18154 • Published Sep 16, 2025 • 52

Yinxu Pan

AI & ML interests

Recent Activity

Organizations

cppowboy's activity

Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models