AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence Paper • 2511.01144 • Published Nov 3 • 3
view article Article CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models +14 May 24, 2024 • 22
REAL-MM-RAG-Bench Collection REAL-MM-RAG-Bench is a benchmark designed to evaluate multi-modal retrieval models under realistic and challenging conditions. • 4 items • Updated Mar 13 • 11