Vision Transformers
updated
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse
Mixture-of-Experts
Paper
• 2309.04354
• Published
• 16
Vision Transformers Need Registers
Paper
• 2309.16588
• Published
• 86
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
Paper
• 2309.16414
• Published
• 19
MotionLM: Multi-Agent Motion Forecasting as Language Modeling
Paper
• 2309.16534
• Published
• 17
BLIP: Bootstrapping Language-Image Pre-training for Unified
Vision-Language Understanding and Generation
Paper
• 2201.12086
• Published
• 3
FiT: Flexible Vision Transformer for Diffusion Model
Paper
• 2402.12376
• Published
• 48
Subobject-level Image Tokenization
Paper
• 2402.14327
• Published
• 18
Scalable Diffusion Models with Transformers
Paper
• 2212.09748
• Published
• 18
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal
Large Language Models
Paper
• 2408.04840
• Published
• 33
Seeing and Understanding: Bridging Vision with Chemical Knowledge Via
ChemVLM
Paper
• 2408.07246
• Published
• 22