Research Group

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

EliYuan00 authored a paper 8 days ago

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

yifanzhang114 authored a paper 9 days ago

VisBrowse-Bench: Benchmarking Visual-Native Search for Multimodal Browsing Agents

yifanzhang114 authored a paper 9 days ago

Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis

View all activity

EliYuan00

authored a paper 8 days ago

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Paper • 2604.05015 • Published 12 days ago • 233

yifanzhang114

authored 4 papers 9 days ago

VisBrowse-Bench: Benchmarking Visual-Native Search for Multimodal Browsing Agents

Paper • 2603.16289 • Published Mar 17

Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis

Paper • 2603.29620 • Published 17 days ago • 46

Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?

Paper • 2604.03016 • Published 15 days ago • 37

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Paper • 2604.05015 • Published 12 days ago • 233

shenyunhang

authored 15 papers 9 days ago

Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification

Paper • 2412.00876 • Published Dec 1, 2024

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression

Paper • 2412.04317 • Published Dec 5, 2024 • 1

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

Paper • 2411.00774 • Published Nov 1, 2024

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

Paper • 2502.05177 • Published Feb 7, 2025 • 2

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

Paper • 2505.03739 • Published May 6, 2025 • 10

What You Perceive Is What You Conceive: A Cognition-Inspired Framework for Open Vocabulary Image Segmentation

Paper • 2505.19569 • Published May 26, 2025

Solving the Catastrophic Forgetting Problem in Generalized Category Discovery

Paper • 2501.05272 • Published Jan 9, 2025 • 1

Aligning and Prompting Everything All at Once for Universal Visual Perception

Paper • 2312.02153 • Published Dec 4, 2023

VITA-VLA: Efficiently Teaching Vision-Language Models to Act via Action Expert Distillation

Paper • 2510.09607 • Published Oct 10, 2025 • 2

Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models

Paper • 2509.26165 • Published Sep 30, 2025

Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models

Paper • 2409.05381 • Published Sep 9, 2024

AI & ML interests

Recent Activity

Team members 4

Video-MME-v2-Team's activity