LLM papers - a hsvgbkhgbv Collection

hsvgbkhgbv 's Collections

LLM papers

updated about 12 hours ago

Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Paper • 2510.03222 • Published Oct 3, 2025 • 76
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Paper • 2510.05592 • Published Oct 7, 2025 • 112
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 515
Multi-Agent Tool-Integrated Policy Optimization

Paper • 2510.04678 • Published Oct 6, 2025 • 31
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning

Paper • 2509.22576 • Published Sep 26, 2025 • 137
A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10, 2025 • 193
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning

Paper • 2509.08755 • Published Sep 10, 2025 • 56
GEM: A Gym for Agentic LLMs

Paper • 2510.01051 • Published Oct 1, 2025 • 92
Agentic Entropy-Balanced Policy Optimization

Paper • 2510.14545 • Published Oct 16, 2025 • 108
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents

Paper • 2510.09577 • Published Oct 10, 2025 • 8
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning

Paper • 2512.15687 • Published Dec 17, 2025 • 22
Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

Paper • 2512.13607 • Published Dec 15, 2025 • 39
Adaptation of Agentic AI

Paper • 2512.16301 • Published Dec 18, 2025 • 111
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning

Paper • 2512.13874 • Published Dec 15, 2025 • 18
Recursive Language Models

Paper • 2512.24601 • Published Dec 31, 2025 • 96
Token-Level LLM Collaboration via FusionRoute

Paper • 2601.05106 • Published Jan 8 • 40
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

Paper • 2601.09667 • Published Jan 14 • 92
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models

Paper • 2601.15165 • Published Jan 21 • 74
Behavior Knowledge Merge in Reinforced Agentic Models

Paper • 2601.13572 • Published Jan 20 • 27
Learning to Discover at Test Time

Paper • 2601.16175 • Published Jan 22 • 45
Endless Terminals: Scaling RL Environments for Terminal Agents

Paper • 2601.16443 • Published Jan 23 • 19
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

Paper • 2601.18778 • Published Jan 26 • 43
Linear representations in language models can change dramatically over a conversation

Paper • 2601.20834 • Published Jan 28 • 21
Self-Distillation Enables Continual Learning

Paper • 2601.19897 • Published Jan 27 • 36
RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System

Paper • 2602.02488 • Published Feb 2 • 36
Self-Hinting Language Models Enhance Reinforcement Learning

Paper • 2602.03143 • Published Feb 3 • 31
Experiential Reinforcement Learning

Paper • 2602.13949 • Published Feb 15 • 75
Multi-agent cooperation through in-context co-player inference

Paper • 2602.16301 • Published Feb 18 • 24
EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robots

Paper • 2602.18071 • Published Feb 20 • 22
Beyond Language Modeling: An Exploration of Multimodal Pretraining

Paper • 2603.03276 • Published Mar 3 • 105
Emergent Social Intelligence Risks in Generative Multi-Agent Systems

Paper • 2603.27771 • Published Mar 29 • 52
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook

Paper • 2604.02029 • Published Apr 2 • 151
Memory Intelligence Agent

Paper • 2604.04503 • Published Apr 6 • 58
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

Paper • 2604.04707 • Published Apr 6 • 203
Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents

Paper • 2604.06132 • Published Apr 7 • 121
How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings

Paper • 2604.04323 • Published Apr 6 • 41
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver

Paper • 2604.08377 • Published Apr 9 • 291
Qualixar OS: A Universal Operating System for AI Agent Orchestration

Paper • 2604.06392 • Published Apr 7 • 19
MultiWorld: Scalable Multi-Agent Multi-View Video World Models

Paper • 2604.18564 • Published Apr 20 • 46
Heterogeneous Scientific Foundation Model Collaboration

Paper • 2604.27351 • Published Apr 30 • 217
Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction

Paper • 2604.27221 • Published Apr 29 • 38
ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration

Paper • 2605.03042 • Published 30 days ago • 124
Trust-Region Behavior Blending for On-Policy Distillation

Paper • 2605.31159 • Published 5 days ago • 59