Human_Eval_RLHF

university

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

seungone authored a paper 26 days ago

Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization

seungone authored a paper 26 days ago

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

DKYoon authored a paper 30 days ago

A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls

View all activity

authored 2 papers 26 days ago

Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization

Paper • 2605.26457 • Published May 26 • 7

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

Paper • 2606.02404 • Published Jun 1 • 59

authored 4 papers 30 days ago

A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls

Paper • 2412.01340 • Published Dec 2, 2024

K-EXAONE Technical Report

Paper • 2601.01739 • Published Jan 5 • 95

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

Paper • 2605.20668 • Published May 20 • 13

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

Paper • 2606.02404 • Published Jun 1 • 59

submitted a paper to Daily Papers 30 days ago

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

Paper • 2606.02404 • Published Jun 1 • 59

authored 9 papers about 1 month ago

CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation

Paper • 2505.24456 • Published May 30, 2025

PRISM: Fine-Grained Paper-to-Paper Retrieval with Multi-Aspect-Aware Query Optimization

Paper • 2507.10057 • Published Jul 14, 2025

Rethinking Reward Models for Multi-Domain Test-Time Scaling

Paper • 2510.00492 • Published Oct 1, 2025 • 28

Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs

Paper • 2510.09201 • Published Oct 10, 2025 • 50

Efficient Long Context Language Model Retrieval with Compression

Paper • 2412.18232 • Published Dec 24, 2024 • 1

PREPING: Building Agent Memory without Tasks

Paper • 2605.13880 • Published May 11 • 28

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

Paper • 2605.20668 • Published May 20 • 13

It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs

Paper • 2605.20258 • Published May 18 • 30

OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources

Paper • 2605.29250 • Published May 28 • 79

submitted a paper to Daily Papers about 1 month ago

OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources

Paper • 2605.29250 • Published May 28 • 79

authored a paper about 1 month ago

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

Paper • 2605.20668 • Published May 20 • 13

submitted a paper to Daily Papers about 1 month ago

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

Paper • 2605.20668 • Published May 20 • 13

authored a paper about 2 months ago

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

Paper • 2603.18886 • Published Mar 19 • 6