GPTVQ: The Blessing of Dimensionality for LLM Quantization Paper • 2402.15319 • Published Feb 23, 2024 • 22
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving Paper • 2501.01005 • Published Jan 2, 2025 • 2