Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Paper • 2512.23447 • Published 7 days ago • 91
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space Paper • 2512.24617 • Published 5 days ago • 49
Improving Recursive Transformers with Mixture of LoRAs Paper • 2512.12880 • Published 22 days ago • 5
SPICE: Self-Play In Corpus Environments Improves Reasoning Paper • 2510.24684 • Published Oct 28, 2025 • 17
SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations Paper • 2512.14080 • Published 20 days ago • 5
Trainable Log-linear Sparse Attention for Efficient Diffusion Transformers Paper • 2512.16615 • Published 18 days ago • 4
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding Paper • 2512.13586 • Published 21 days ago • 88
VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse Paper • 2512.14531 • Published 20 days ago • 12
OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value Paper • 2512.14051 • Published 20 days ago • 40
Fast and Accurate Causal Parallel Decoding using Jacobi Forcing Paper • 2512.14681 • Published 20 days ago • 39