MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning Paper • 2601.21468 • Published 14 days ago • 20
DocReward: A Document Reward Model for Structuring and Stylizing Paper • 2510.11391 • Published Oct 13, 2025 • 27
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers Paper • 2601.14133 • Published 22 days ago • 60
Endless Terminals: Scaling RL Environments for Terminal Agents Paper • 2601.16443 • Published 20 days ago • 16
Behavior Knowledge Merge in Reinforced Agentic Models Paper • 2601.13572 • Published 23 days ago • 24
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning Paper • 2601.16163 • Published 20 days ago • 13
VibeVoice Collection Frontier Text-to-Speech Models https://microsoft.github.io/VibeVoice/ • 9 items • Updated 21 days ago • 207
LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR Paper • 2601.14251 • Published 22 days ago • 24
Language of Thought Shapes Output Diversity in Large Language Models Paper • 2601.11227 • Published 27 days ago • 9
BigVGAN Collection BigVGAN is a universal neural vocoder that generates audio waveform using mel spectrogram as input. • 11 items • Updated 7 days ago • 16
view article Article LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family 23 days ago • 80
MIRIAD: Augmenting LLMs with millions of medical query-response pairs Paper • 2506.06091 • Published Jun 6, 2025 • 11
view article Article How We Built a Semantic Highlight Model To Save Token Cost for RAG 28 days ago • 65
Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards Paper • 2601.06021 • Published Jan 9 • 45