SafeSwitch: Steering Unsafe LLM Behavior via Internal Activation Signals Paper • 2502.01042 • Published Feb 3 • 1
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence Paper • 2507.21046 • Published Jul 28 • 82
UserBench: An Interactive Gym Environment for User-Centric Agents Paper • 2507.22034 • Published Jul 29 • 29
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering Paper • 2509.09614 • Published Sep 11 • 7
UserRL: Training Interactive User-Centric Agent via Reinforcement Learning Paper • 2509.19736 • Published Sep 24 • 12
CREATOR: Disentangling Abstract and Concrete Reasonings of Large Language Models through Tool Creation Paper • 2305.14318 • Published May 23, 2023
Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents Paper • 2402.09205 • Published Feb 14, 2024
Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance Paper • 2410.12361 • Published Oct 16, 2024
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents Paper • 2502.09560 • Published Feb 13 • 35
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents Paper • 2503.01935 • Published Mar 3 • 29
AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset Paper • 2504.03612 • Published Apr 4 • 2
The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination Paper • 2502.16143 • Published Feb 22 • 6