Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Alan's picture
5 6 5

Alan

wizardII
rawsh's profile picture
·
  • wizard-III

AI & ML interests

RL & LLM

Recent Activity

new activity 26 days ago
deepseek-ai/DeepSeek-V3.2:Unbelievable
liked a model 30 days ago
deepseek-ai/DeepSeek-Math-V2
updated a collection 3 months ago
Archer2.0
View all activity

Organizations

Fate's profile picture

upvoted 4 papers 3 months ago

ASPO: Asymmetric Importance Sampling Policy Optimization

Paper • 2510.06062 • Published Oct 7 • 13

DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published Sep 29 • 140

Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models

Paper • 2509.26628 • Published Sep 30 • 16

Tree Search for LLM Agent Reinforcement Learning

Paper • 2509.21240 • Published Sep 25 • 89
upvoted 2 papers 5 months ago

SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens

Paper • 2508.05305 • Published Aug 7 • 46

Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR

Paper • 2507.15778 • Published Jul 21 • 20
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs