Huan Wang

Huan-WhoRegisteredMyName

AI & ML interests

Assistant Professor at Westlake. PhD at NEU, MS & BE at ZJU. Work on Efficient AI, Machine Learning, Computer Vision, and MLSys.

Recent Activity

upvoted a paper 7 days ago

OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding

upvoted a paper about 2 months ago

OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models

upvoted a paper 3 months ago

UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG

View all activity

Organizations

upvoted a paper 7 days ago

OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding

Paper • 2512.23646 • Published 9 days ago • 14

upvoted a paper about 2 months ago

OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models

Paper • 2511.14582 • Published Nov 18, 2025 • 18

upvoted a paper 3 months ago

UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG

Paper • 2510.03663 • Published Oct 4, 2025 • 15

authored a paper 3 months ago

OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot

Paper • 2510.06751 • Published Oct 8, 2025 • 21

upvoted a paper 3 months ago

OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot

Paper • 2510.06751 • Published Oct 8, 2025 • 21

liked a model 3 months ago

lkongam/KernelCoder

33B • Updated Oct 6, 2025 • 9 • 2

authored a paper 3 months ago

RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning

Paper • 2510.02240 • Published Oct 2, 2025 • 17

upvoted a paper 3 months ago

RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning

Paper • 2510.02240 • Published Oct 2, 2025 • 17

liked a model 5 months ago

ByteDance-Seed/Seed-OSS-36B-Instruct

Text Generation • 36B • Updated Aug 26, 2025 • 8.61k • 469

authored a paper 5 months ago

When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios

Paper • 2507.20198 • Published Jul 27, 2025 • 26

upvoted a paper 5 months ago

When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios

Paper • 2507.20198 • Published Jul 27, 2025 • 26

authored 3 papers 7 months ago

upvoted a paper 7 months ago

HoliTom: Holistic Token Merging for Fast Video Large Language Models

Paper • 2505.21334 • Published May 27, 2025 • 21

authored a paper 8 months ago

Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps

Paper • 2505.18675 • Published May 24, 2025 • 26

upvoted a paper 8 months ago

Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps

Paper • 2505.18675 • Published May 24, 2025 • 26

liked a model 9 months ago

Qwen/Qwen2.5-Omni-7B

Any-to-Any • 11B • Updated Apr 30, 2025 • 155k • 1.84k

upvoted a paper 10 months ago

Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models

Paper • 2503.16257 • Published Mar 20, 2025 • 26

commented a paper 10 months ago

Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models

Paper • 2503.16257 • Published Mar 20, 2025 • 26 •

Huan Wang

AI & ML interests

Recent Activity

Organizations

Huan-WhoRegisteredMyName's activity