SnapKV: LLM Knows What You are Looking for Before Generation Paper • 2404.14469 • Published Apr 22, 2024 • 27
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks Paper • 2402.09025 • Published Feb 14, 2024 • 10
Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning Paper • 2505.13866 • Published May 20, 2025 • 17
FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration Paper • 2502.01068 • Published Feb 3, 2025 • 18
GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning Paper • 2505.20355 • Published May 26, 2025 • 36