AI & ML interests

Computer Vision

Recent Activity

OpenGVLab 's collections 34

InternVideo-Next
InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision
InternVL3.5-Core
This collection includes only the InternVL3.5 checkpoints that have completed the full training pipeline (i.e., Pretraining, SFT, MPO, Cascade RL).
Mono-InternVL
A Pioneering Monolithic MLLM
InternVL1.0
Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
InternVL3.5-Flash
InternVL3.5-Flash is a fast variant of InternVL3.5 using semantic aware dynamic high-resolution strategy.
InternVL3.5
This collection includes all released checkpoints of InternVL3.5, covering different training stages (e.g., Pretraining, SFT, MPO, Cascade RL).
PIIP
[NeurIPS 2024 Spotlight (Ranking Top 10), TPAMI 2025] Parameter-Inverted Image Pyramid Networks
VideoChat-Flash
Faster and more powerful VideoChat.
InternVL2.5-MPO
Enhancing the Reasoning Ability of MLLMs via Mixed Preference Optimization
InternVL1.5
A Pioneering Open-Source Alternative to GPT-4V
InternImage
Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
InternVideo-Next
InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision
InternVL3.5-Flash
InternVL3.5-Flash is a fast variant of InternVL3.5 using semantic aware dynamic high-resolution strategy.
InternVL3.5-Core
This collection includes only the InternVL3.5 checkpoints that have completed the full training pipeline (i.e., Pretraining, SFT, MPO, Cascade RL).
InternVL3.5
This collection includes all released checkpoints of InternVL3.5, covering different training stages (e.g., Pretraining, SFT, MPO, Cascade RL).
Mono-InternVL
A Pioneering Monolithic MLLM
PIIP
[NeurIPS 2024 Spotlight (Ranking Top 10), TPAMI 2025] Parameter-Inverted Image Pyramid Networks
VideoChat-Flash
Faster and more powerful VideoChat.
InternVL2.5-MPO
Enhancing the Reasoning Ability of MLLMs via Mixed Preference Optimization
InternVL1.5
A Pioneering Open-Source Alternative to GPT-4V
InternVL1.0
Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
InternImage
Exploring Large-Scale Vision Foundation Models with Deformable Convolutions