hsvgbkhgbv
's Collections
LLM papers
updated
Low-probability Tokens Sustain Exploration in Reinforcement Learning
with Verifiable Reward
Paper
•
2510.03222
•
Published
•
75
In-the-Flow Agentic System Optimization for Effective Planning and Tool
Use
Paper
•
2510.05592
•
Published
•
107
Less is More: Recursive Reasoning with Tiny Networks
Paper
•
2510.04871
•
Published
•
506
Multi-Agent Tool-Integrated Policy Optimization
Paper
•
2510.04678
•
Published
•
31
EPO: Entropy-regularized Policy Optimization for LLM Agents
Reinforcement Learning
Paper
•
2509.22576
•
Published
•
135
A Survey of Reinforcement Learning for Large Reasoning Models
Paper
•
2509.08827
•
Published
•
190
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making
through Multi-Turn Reinforcement Learning
Paper
•
2509.08755
•
Published
•
57
GEM: A Gym for Agentic LLMs
Paper
•
2510.01051
•
Published
•
90
Agentic Entropy-Balanced Policy Optimization
Paper
•
2510.14545
•
Published
•
106
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
Paper
•
2510.09577
•
Published
•
8
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
Paper
•
2512.15687
•
Published
•
20
Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models
Paper
•
2512.13607
•
Published
•
34
Paper
•
2512.16301
•
Published
•
105
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
Paper
•
2512.13874
•
Published
•
17
Recursive Language Models
Paper
•
2512.24601
•
Published
•
81
Token-Level LLM Collaboration via FusionRoute
Paper
•
2601.05106
•
Published
•
39
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning
Paper
•
2601.09667
•
Published
•
86
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models
Paper
•
2601.15165
•
Published
•
71
Behavior Knowledge Merge in Reinforced Agentic Models
Paper
•
2601.13572
•
Published
•
24
Learning to Discover at Test Time
Paper
•
2601.16175
•
Published
•
41
Endless Terminals: Scaling RL Environments for Terminal Agents
Paper
•
2601.16443
•
Published
•
16
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper
•
2601.18778
•
Published
•
40
Linear representations in language models can change dramatically over a conversation
Paper
•
2601.20834
•
Published
•
21
Self-Distillation Enables Continual Learning
Paper
•
2601.19897
•
Published
•
24