Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
anujga 's Collections
RL2
RecSys
rl-papers
Multi-lingual
Retrieval
Special
Aggregates
PT
Persona
Pt-classify
Sft
O1
Rl
Programming
Benchmark
Architecture
Datasets
Theory
agent
data/tool
data/vision
chemistry

Rl

updated Mar 18
Upvote
-

  • RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment

    Paper • 2307.12950 • Published Jul 24, 2023 • 10

  • HumanLLMs/Human-Like-DPO-Dataset

    Viewer • Updated Jan 12 • 10.9k • 1.22k • 242

  • sam-paech/gutenberg3-generalfiction-scifi-fantasy-romance-adventure-dpo

    Viewer • Updated Oct 23, 2024 • 5.65k • 136 • 32

  • RLHFlow/Deepseek-PRM-Data

    Viewer • Updated Nov 9, 2024 • 253k • 69 • 17

  • RLHFlow/DS-and-Mistral-PRM-Data

    Viewer • Updated Nov 10, 2024 • 526k • 41

  • TIGER-Lab/WebInstruct-CFT

    Viewer • Updated Feb 2 • 654k • 173 • 56

  • deu05232/promptriever-ours2-filtered_FN

    Viewer • Updated Feb 10 • 1.31M • 132

  • argilla/distilabel-intel-orca-dpo-pairs

    Viewer • Updated Aug 7 • 12.9k • 3.92k • 181
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs