Rl - a anujga Collection

anujga 's Collections

RL2

RecSys

Special

PT

Persona

Sft

O1

Rl

Theory

agent

Rl

updated Mar 18

RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment

Paper • 2307.12950 • Published Jul 24, 2023 • 10
HumanLLMs/Human-Like-DPO-Dataset

Viewer • Updated Jan 12 • 10.9k • 1.22k • 242
sam-paech/gutenberg3-generalfiction-scifi-fantasy-romance-adventure-dpo

Viewer • Updated Oct 23, 2024 • 5.65k • 136 • 32
RLHFlow/Deepseek-PRM-Data

Viewer • Updated Nov 9, 2024 • 253k • 69 • 17
RLHFlow/DS-and-Mistral-PRM-Data

Viewer • Updated Nov 10, 2024 • 526k • 41
TIGER-Lab/WebInstruct-CFT

Viewer • Updated Feb 2 • 654k • 173 • 56
deu05232/promptriever-ours2-filtered_FN

Viewer • Updated Feb 10 • 1.31M • 132
argilla/distilabel-intel-orca-dpo-pairs

Viewer • Updated Aug 7 • 12.9k • 3.92k • 181