Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation Paper • 2602.16705 • Published 9 days ago • 26
VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model Paper • 2602.10098 • Published 17 days ago • 18
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation Paper • 2602.03796 • Published 24 days ago • 58
OmniSpatial Collection Collections of ICLR 2026 paper: "OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models" • 4 items • Updated Jan 27 • 1
GS-Reasoner Collection Collections of paper "Reasoning in Space via Grounding in the World" • 6 items • Updated Oct 20, 2025 • 2
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25, 2025 • 214
ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for Long-Horizon Tasks Paper • 2508.08240 • Published Aug 11, 2025 • 45
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Paper • 2508.10711 • Published Aug 14, 2025 • 145
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models Paper • 2507.13344 • Published Jul 17, 2025 • 59
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Paper • 2507.05255 • Published Jul 7, 2025 • 75
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge Paper • 2507.04447 • Published Jul 6, 2025 • 45
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models Paper • 2506.03135 • Published Jun 3, 2025 • 40
ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding Paper • 2506.01853 • Published Jun 2, 2025 • 32
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time Paper • 2505.24863 • Published May 30, 2025 • 97
Step1X-Edit: A Practical Framework for General Image Editing Paper • 2504.17761 • Published Apr 24, 2025 • 92
DreamLLM Collection [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation (https://arxiv.org/abs/2309.11499) • 6 items • Updated Mar 22, 2024 • 3
Unleashing Vecset Diffusion Model for Fast Shape Generation Paper • 2503.16302 • Published Mar 20, 2025 • 43