arxiv:2605.19908

Where Does Authorship Signal Emerge in Encoder-Based Language Models?

Published on May 19

· Submitted by

Francis Kulumba on May 20

ALMAnaCH (Inria)

Upvote

Authors:

Francis Kulumba ,

Abstract

Authorship attribution model performance varies significantly based on scoring mechanisms rather than representation quality, with different consolidation layers of authorship signals determined by gradient structures and training dynamics.

AI-generated summary

Authorship attribution models fine-tuned with the same pretrained encoder, data, and loss can differ four-fold in performance depending only on their scoring mechanism. We use mechanistic interpretability tools to explain this gap. Stylistic features such as word length, punctuation density, and function-word frequency are equally available at every layer in every model, including in an off-the-shelf control encoder, hence the gap not coming from representation quality. Instead, causal intervention shows that the scorer determines where the encoder consolidates authorship signal. Mean pooling forces consolidation by early to mid layers, while late interaction defers it to later layers. We further derive this difference from the gradient structure of each scorer, and training dynamics reveal distinct learning trajectories that follow from that difference.

View arXiv page View PDF GitHub 1 Add to collection

Community

Madjakul

Paper author Paper submitter about 8 hours ago

The main bottleneck in contrastive authorship attribution is not whether stylistic information exists in the encoder, but whether the scoring mechanism can preserve and exploit it.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.19908

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.19908 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.19908 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.19908 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.