Code2World: A GUI World Model via Renderable Code Generation Paper • 2602.09856 • Published 5 days ago • 186
Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models Paper • 2601.20354 • Published 18 days ago • 110
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation Paper • 2601.20614 • Published 18 days ago • 118
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding Paper • 2601.14724 • Published 25 days ago • 74
UniX: Unifying Autoregression and Diffusion for Chest X-Ray Understanding and Generation Paper • 2601.11522 • Published 30 days ago • 17
RemoteVAR: Autoregressive Visual Modeling for Remote Sensing Change Detection Paper • 2601.11898 • Published 29 days ago • 4
Think3D: Thinking with Space for Spatial Reasoning Paper • 2601.13029 • Published 27 days ago • 47
Urban Socio-Semantic Segmentation with Vision-Language Reasoning Paper • 2601.10477 • Published about 1 month ago • 155
Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization Paper • 2601.05432 • Published Jan 8 • 166
Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation Paper • 2512.24271 • Published Dec 30, 2025 • 63
Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum Paper • 2510.27571 • Published Oct 31, 2025 • 19
Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training Paper • 2510.12586 • Published Oct 14, 2025 • 113
Tree Search for LLM Agent Reinforcement Learning Paper • 2509.21240 • Published Sep 25, 2025 • 92
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning Paper • 2505.14231 • Published May 20, 2025 • 53
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning Paper • 2503.07588 • Published Mar 10, 2025 • 7