BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models Paper • 2401.12242 • Published Jan 20, 2024 • 1
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases Paper • 2407.12784 • Published Jul 17, 2024 • 51
Evaluation of OpenAI o1: Opportunities and Challenges of AGI Paper • 2409.18486 • Published Sep 27, 2024
SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities Paper • 2502.12025 • Published Feb 17 • 3
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models Paper • 2503.14827 • Published Mar 19
UMD: Unsupervised Model Detection for X2X Backdoor Attacks Paper • 2305.18651 • Published May 29, 2023 • 1