Cheng Qian's picture

2 23

Cheng Qian

chengq9

·

https://qiancheng0.github.io

qiancheng0

AI & ML interests

Agent, Tool Learning

Recent Activity

upvoted a paper 8 days ago

JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

upvoted a paper 2 months ago

Multimodal Policy Internalization for Conversational Agents

upvoted a paper 2 months ago

Self-Improving LLM Agents at Test-Time

View all activity

Organizations

authored 6 papers 3 months ago

RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published May 5 • 79

SafeSwitch: Steering Unsafe LLM Behavior via Internal Activation Signals

Paper • 2502.01042 • Published Feb 3 • 1

A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Paper • 2507.21046 • Published Jul 28 • 82

UserBench: An Interactive Gym Environment for User-Centric Agents

Paper • 2507.22034 • Published Jul 29 • 29

LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering

Paper • 2509.09614 • Published Sep 11 • 7

UserRL: Training Interactive User-Centric Agent via Reinforcement Learning

Paper • 2509.19736 • Published Sep 24 • 12

authored 11 papers 8 months ago

Tool Learning with Foundation Models

Paper • 2304.08354 • Published Apr 17, 2023 • 3

CREATOR: Disentangling Abstract and Concrete Reasonings of Large Language Models through Tool Creation

Paper • 2305.14318 • Published May 23, 2023

Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents

Paper • 2402.09205 • Published Feb 14, 2024

Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance

Paper • 2410.12361 • Published Oct 16, 2024

EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents

Paper • 2502.09560 • Published Feb 13 • 35

MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents

Paper • 2503.01935 • Published Mar 3 • 29

SMART: Self-Aware Agent for Tool Overuse Mitigation

Paper • 2502.11435 • Published Feb 17

AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset

Paper • 2504.03612 • Published Apr 4 • 2

The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination

Paper • 2502.16143 • Published Feb 22 • 6

ToolRL: Reward is All Tool Learning Needs

Paper • 2504.13958 • Published Apr 16 • 48

OTC: Optimal Tool Calls via Reinforcement Learning

Paper • 2504.14870 • Published Apr 21 • 35