Circuit Transformer: A Transformer That Preserves Logical Equivalence Paper • 2403.13838 • Published Mar 14, 2024
AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference Paper • 2502.04077 • Published Feb 6, 2025 • 2
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging Paper • 2503.20641 • Published Mar 26, 2025 • 10
TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling Paper • 2505.17155 • Published May 22, 2025
KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference Paper • 2502.04420 • Published Feb 6, 2025 • 1