Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders Paper • 2603.06569 • Published 26 days ago • 117
Flash-KMeans: Fast and Memory-Efficient Exact K-Means Paper • 2603.09229 • Published 23 days ago • 82
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation Paper • 2412.10704 • Published Dec 14, 2024 • 16
view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference Jan 16, 2025 • 76
DataGemma Release Collection A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated 20 days ago • 87
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 263