Collections
Discover the best community collections!
Collections including paper arxiv:2511.21689
-
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 97 -
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Paper • 2501.02790 • Published • 9 -
Who's Your Judge? On the Detectability of LLM-Generated Judgments
Paper • 2509.25154 • Published • 29 -
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning
Paper • 2509.25760 • Published • 55
-
PretrainZero: Reinforcement Active Pretraining
Paper • 2512.03442 • Published • 44 -
UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs
Paper • 2512.03383 • Published • 3 -
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Paper • 2511.21689 • Published • 100 -
Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models
Paper • 2511.18890 • Published • 29
-
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 187 -
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Paper • 2511.21689 • Published • 100 -
PretrainZero: Reinforcement Active Pretraining
Paper • 2512.03442 • Published • 44
-
PretrainZero: Reinforcement Active Pretraining
Paper • 2512.03442 • Published • 44 -
UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs
Paper • 2512.03383 • Published • 3 -
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Paper • 2511.21689 • Published • 100 -
Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models
Paper • 2511.18890 • Published • 29
-
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 187 -
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Paper • 2511.21689 • Published • 100 -
PretrainZero: Reinforcement Active Pretraining
Paper • 2512.03442 • Published • 44
-
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 97 -
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Paper • 2501.02790 • Published • 9 -
Who's Your Judge? On the Detectability of LLM-Generated Judgments
Paper • 2509.25154 • Published • 29 -
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning
Paper • 2509.25760 • Published • 55