Xiao Hu's picture

3 3

Xiao Hu

huxiao09

·

huxiao09

AI & ML interests

Reinforcement Learning, LLM Reasoning

Recent Activity

liked a model about 1 month ago

Kwai-Keye/Keye-VL-671B-A37B

upvoted a paper 4 months ago

Thyme: Think Beyond Images

authored a paper 6 months ago

Query-Policy Misalignment in Preference-Based Reinforcement Learning

View all activity

Organizations

None yet

liked a model about 1 month ago

Kwai-Keye/Keye-VL-671B-A37B

Video-Text-to-Text • 672B • Updated Nov 20 • 114 • 18

upvoted a paper 4 months ago

Thyme: Think Beyond Images

Paper • 2508.11630 • Published Aug 15 • 81

authored 5 papers 6 months ago

Query-Policy Misalignment in Preference-Based Reinforcement Learning

Paper • 2305.17400 • Published May 27, 2023

Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

Paper • 2402.03046 • Published Feb 5, 2024 • 7

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

Paper • 2505.02835 • Published May 5 • 28

Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning

Paper • 2505.21067 • Published May 27 • 3

Kwai Keye-VL Technical Report

Paper • 2507.01949 • Published Jul 2 • 130

upvoted a paper 6 months ago

Kwai Keye-VL Technical Report

Paper • 2507.01949 • Published Jul 2 • 130

upvoted a paper 7 months ago

Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning

Paper • 2505.21067 • Published May 27 • 3

liked a model over 1 year ago

meta-llama/Meta-Llama-3-8B-Instruct

Text Generation • 8B • Updated Jun 18 • 1.53M • • 4.34k

liked a model almost 2 years ago

meta-llama/Llama-2-7b-hf

Text Generation • 7B • Updated Apr 17, 2024 • 583k • 2.23k