Papers
arxiv:2601.05882

An Empirical Study on Preference Tuning Generalization and Diversity Under Domain Shift

Published on Jan 9
· Submitted by
Xingwei Tan
on Jan 12
Authors:

Abstract

Preference tuning of language models shows varying generalization capabilities under domain shift, with pseudo-labeling adaptation strategies effectively reducing performance degradation in summarization and question-answering tasks.

AI-generated summary

Preference tuning aligns pretrained language models to human judgments of quality, helpfulness, or safety by optimizing over explicit preference signals rather than likelihood alone. Prior work has shown that preference-tuning degrades performance and reduces helpfulness when evaluated outside the training domain. However, the extent to which adaptation strategies mitigate this domain shift remains unexplored. We address this challenge by conducting a comprehensive and systematic study of alignment generalization under domain shift. We compare five popular alignment objectives and various adaptation strategies from source to target, including target-domain supervised fine-tuning and pseudo-labeling, across summarization and question-answering helpfulness tasks. Our findings reveal systematic differences in generalization across alignment objectives under domain shift. We show that adaptation strategies based on pseudo-labeling can substantially reduce domain-shift degradation

Community

Paper author Paper submitter

Our paper presents a systematic study of preference-optimization under domain shift. We compare five popular alignment objectives and various adaptation strategies from source to target, including target-domain supervised fine-tuning and pseudo-labeling, across summarization and question-answering helpfulness tasks. We found:

  • The adaptation strategy is more influential than the alignment objective.
  • We identify that synthetic supervision is a double-edged sword. While pseudo-labeling yields the highest target-domain win rates, it induces severe mode collapse. This diversity tax results in models that are highly reliable but linguistically monotonous, mirroring the latent templates of the teacher model.
  • Our findings suggest a deployment recommendation: use pseudo-labeling for high-stakes and constrained tasks where reliability is paramount, but favor mixed-domain SFT and online RL for applications requiring creative or varied linguistic expression.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.05882 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.05882 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.05882 in a Space README.md to link it from this page.

Collections including this paper 1