MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU Paper • 2604.05091 • Published 5 days ago • 39
EXAONE 4.5 Collection LG's First Open-Weight Vision-Language Model for Industrial Intelligence • 3 items • Updated 2 days ago • 26
DFlash Collection Block Diffusion for Flash Speculative Decoding • 13 items • Updated 5 days ago • 47
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? Paper • 2603.24472 • Published 16 days ago • 51
MolmoWeb Collection This is the collection of MolmoWeb artifacts, including model checkpoints and data. • 6 items • Updated about 1 hour ago • 22
Devstral 2 Collection A couple of agentic LLMs for software engineering tasks, excelling at using tools to explore codebases, edit multiple files, and power SWE Agents. • 2 items • Updated Mar 2 • 52
MiniCPM4 Collection MiniCPM4: Ultra-Efficient LLMs on End Devices • 30 items • Updated 5 days ago • 84