DFlash: Block Diffusion for Flash Speculative Decoding
Paper
•
2602.06036
•
Published
•
40
Efficient AI
DFlash: Block Diffusion for Flash Speculative Decoding
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference