Self-Improving Agents
updated
Flow-DPO: Improving LLM Mathematical Reasoning through Online
Multi-Agent Learning
Paper
•
2410.22304
•
Published
•
18
OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World
Exploration, Feedback and Optimization
Paper
•
2410.19609
•
Published
•
18
Adapting While Learning: Grounding LLMs for Scientific Problems with
Intelligent Tool Usage Adaptation
Paper
•
2411.00412
•
Published
•
10
Improving Autonomous AI Agents with Reflective Tree Search and
Self-Learning
Paper
•
2410.02052
•
Published
•
9
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit
Assignment
Paper
•
2410.01679
•
Published
•
27
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning
Trajectories Search
Paper
•
2410.03864
•
Published
•
12
AFlow: Automating Agentic Workflow Generation
Paper
•
2410.10762
•
Published
•
1
Boundless Socratic Learning with Language Games
Paper
•
2411.16905
•
Published
•
2
Enabling Scalable Oversight via Self-Evolving Critic
Paper
•
2501.05727
•
Published
•
72
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains
Paper
•
2501.05707
•
Published
•
20
Agent-R: Training Language Model Agents to Reflect via Iterative
Self-Training
Paper
•
2501.11425
•
Published
•
109
Towards General-Purpose Model-Free Reinforcement Learning
Paper
•
2501.16142
•
Published
•
30
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper
•
2508.16153
•
Published
•
160
SIM-CoT: Supervised Implicit Chain-of-Thought
Paper
•
2509.20317
•
Published
•
41
TTRL: Test-Time Reinforcement Learning
Paper
•
2504.16084
•
Published
•
120
Toward Training Superintelligent Software Agents through Self-Play SWE-RL
Paper
•
2512.18552
•
Published
•
1