Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning Paper • 2505.13866 • Published May 20 • 17
LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning Paper • 2510.14211 • Published Oct 16 • 7
Retrospective Sparse Attention for Efficient Long-Context Generation Paper • 2508.09001 • Published Aug 12 • 2