Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models Paper • 2602.07026 • Published 12 days ago • 133
FastVLM: Efficient Vision Encoding for Vision Language Models Paper • 2412.13303 • Published Dec 17, 2024 • 74
Running on CPU Upgrade Featured 1.22k Open ASR Leaderboard 🏆 1.22k Explore and compare speech‑recognition model benchmarks
Running 347 VBench Leaderboard 📊 347 Submit video model evaluation results to update benchmark scores