Benchmark^2: Systematic Evaluation of LLM Benchmarks Paper • 2601.03986 • Published 8 days ago • 33 • 3