kyutai/mimi
Feature Extraction
•
96.2M
•
Updated
•
1.4M
•
•
274
None defined yet.
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion
ARC-Encoder: learning compressed text representations for large language models