DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels
Abstract
Diffusion large language models (dLLMs) for CUDA kernel generation achieve superior performance through a specialized dataset and reinforcement learning framework.
Diffusion large language models (dLLMs) have emerged as a compelling alternative to autoregressive (AR) LLMs, owing to their capacity for parallel token generation. This paradigm is particularly well-suited for code generation, where holistic structural planning and non-sequential refinement are critical. Despite this potential, tailoring dLLMs for CUDA kernel generation remains challenging, obstructed not only by the high specialization but also by the severe lack of high-quality training data. To address these challenges, we construct CuKe, an augmented supervised fine-tuning dataset optimized for high-performance CUDA kernels. On top of it, we propose a bi-phase curated reinforcement learning (BiC-RL) framework consisting of a CUDA kernel infilling stage and an end-to-end CUDA kernel generation stage. Leveraging this training framework, we introduce DICE, a series of diffusion large language models designed for CUDA kernel generation, spanning three parameter scales, 1.7B, 4B, and 8B. Extensive experiments on KernelBench demonstrate that DICE significantly outperforms both autoregressive and diffusion LLMs of comparable scale, establishing a new state-of-the-art for CUDA kernel generation.
Community
DICE series are the first diffusion large language models (dLLMs) tailored for CUDA kernel generation. Welcome discussion and collaboration!
uuu sounds amazing! cant wait to try this one!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- DiRL: An Efficient Post-Training Framework for Diffusion Language Models (2025)
- Advancing Block Diffusion Language Models for Test-Time Scaling (2026)
- Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models (2026)
- Streaming-dLLM: Accelerating Diffusion LLMs via Suffix Pruning and Dynamic Decoding (2026)
- d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation (2026)
- FOCUS: DLLMs Know How to Tame Their Compute Bound (2026)
- Focus-dLLM: Accelerating Long-Context Diffusion LLM Inference via Confidence-Guided Context Focusing (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper