The corpipe25-corefud1.3-base-251101 Model

The corpipe25-corefud1.3-base-251101 is a umT5-base-based multilingual model for coreference resolution usable in CorPipe 25 https://github.com/ufal/crac2025-corpipe. It is released on LINDAT/CLARIAH-CZ and on HuggingFace under the CC BY-NC-SA 4.0 license. The model is downloaded automatically from HuggingFace when running prediction with the --load ufal/corpipe25-corefud1.3-base-251101 argument.

The model is language agnostic, so it can be in theory used to predict coreference in any umT5 language; for zero-shot cross-lingual evaluation, please refer to the CRAC 2025 paper.

The model expects empty nodes to be already present on input, predicted by https://github.com/ufal/crac2025_empty_nodes_baseline.

The model was trained using the following command (see the CorPipe 25 repository for more information):

tbs="ca_ancora cs_pcedt cs_pdt cu_proiel de_potsdamcc en_gum en_litbank es_ancora fr_ancor fr_democrat grc_proiel hbo_ptnk hi_hdtb hu_korkor hu_szegedkoref ko_ecmt lt_lcc no_bokmaalnarc no_nynorsknarc pl_pcc ru_rucor tr_itcc"

python3 corpipe25.py --train --dev --treebanks $(for c in $tbs; do echo data/$c/$c-corefud-train.conllu; done) --batch_size=8 --learning_rate=6e-4 --learning_rate_decay  --adafactor --encoder=google/umt5-base --exp=corpipe25-corefud1.3-base --compile

CorefUD 1.3 Test Sets Results

The model achieves the following CorefUD 1.3 test set results (as reported in the paper); segment size 2560 was used, with the exception for cu_proiel and grc_proiel where it was 512:

avg ca cs_pce cs_pdt cu de_pot en_gum en_lit es fr_anc fr_dem grc hbo_pt hi hu_kor hu_sze ko_emc lt no_bok no_nyn pl ru tr
69.27 77.4 73.5 75.1 53.5 62.0 71.0 72.8 78.6 71.2 66.7 64.9 59.0 72.7 61.5 63.7 67.8 72.9 73.2 70.4 74.5 77.8 63.9

Running the Model on Plain Text

To run the model on plain text, first the plain text needs to be tokenized and converted to CoNLL-U (and optionally parsed if you also want mention heads), by using for example UDPipe 2:

curl -F data="Eve came home and Peter greeted her there. Then Peter and Paul set out to a trip and Eve waved them off." \
  -F model=english -F tokenizer= -F tagger= -F parser=  https://lindat.mff.cuni.cz/services/udpipe/api/process \
  | python -X utf8 -c "import sys,json; sys.stdout.write(json.load(sys.stdin)['result'])" >input.conllu

Then the CoNLL-U file can be processed by CorPipe 25, by using for example

python3 corpipe25.py --load ufal/corpipe25-corefud1.3-base-251101 --exp . --epoch 0 --test input.conllu

which would generate the following predictions in input.00.conllu:

# generator = UDPipe 2, https://lindat.mff.cuni.cz/services/udpipe
# udpipe_model = english-ewt-ud-2.17-251125
# udpipe_model_licence = CC BY-NC-SA
# newdoc
# global.Entity = eid-etype-head-other
# newpar
# sent_id = 1
# text = Eve came home and Peter greeted her there.
1	Eve	Eve	PROPN	NNP	Number=Sing	2	nsubj	_	Entity=(c1--1)
2	came	come	VERB	VBD	Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin	0	root	_	_
3	home	home	ADV	RB	_	2	advmod	_	Entity=(c2--1)
4	and	and	CCONJ	CC	_	6	cc	_	_
5	Peter	Peter	PROPN	NNP	Number=Sing	6	nsubj	_	Entity=(c3--1)
6	greeted	greet	VERB	VBD	Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin	2	conj	_	_
7	her	she	PRON	PRP	Case=Acc|Gender=Fem|Number=Sing|Person=3|PronType=Prs	6	obj	_	Entity=(c1--1)
8	there	there	ADV	RB	PronType=Dem	6	advmod	_	Entity=(c2--1)|SpaceAfter=No
9	.	.	PUNCT	.	_	2	punct	_	_

# sent_id = 2
# text = Then Peter and Paul set out to a trip and Eve waved them off.
1	Then	then	ADV	RB	PronType=Dem	5	advmod	_	_
2	Peter	Peter	PROPN	NNP	Number=Sing	5	nsubj	_	Entity=(c4--1(c3--1)
3	and	and	CCONJ	CC	_	4	cc	_	_
4	Paul	Paul	PROPN	NNP	Number=Sing	2	conj	_	Entity=(c5--1)c4)
5	set	set	VERB	VBD	Mood=Ind|Number=Plur|Person=3|Tense=Past|VerbForm=Fin	0	root	_	_
6	out	out	ADP	RP	_	5	compound:prt	_	_
7	to	to	ADP	IN	_	9	case	_	_
8	a	a	DET	DT	Definite=Ind|PronType=Art	9	det	_	Entity=(c6--2
9	trip	trip	NOUN	NN	Number=Sing	5	obl	_	Entity=c6)
10	and	and	CCONJ	CC	_	12	cc	_	_
11	Eve	Eve	PROPN	NNP	Number=Sing	12	nsubj	_	Entity=(c1--1)
12	waved	wave	VERB	VBD	Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin	5	conj	_	_
13	them	they	PRON	PRP	Case=Acc|Number=Plur|Person=3|PronType=Prs	12	obj	_	Entity=(c4--1)
14	off	off	ADP	RP	_	12	compound:prt	_	SpaceAfter=No
15	.	.	PUNCT	.	_	5	punct	_	SpaceAfter=No

How to Cite

@inproceedings{straka-2025-corpipe,
  title = "{C}or{P}ipe at {CRAC} 2025: Evaluating Multilingual Encoders for Multilingual Coreference Resolution",
  author = "Straka, Milan",
  editor = "Ogrodniczuk, Maciej and Novak, Michal and Poesio, Massimo and Pradhan, Sameer and Ng, Vincent",
  booktitle = "Proceedings of the Eighth Workshop on Computational Models of Reference, Anaphora and Coreference",
  month = nov,
  year = "2025",
  address = "Suzhou, China",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2025.crac-1.11/",
  doi = "10.18653/v1/2025.crac-1.11",
  pages = "130--139",
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ufal/corpipe25-corefud1.3-base-251101

Base model

google/umt5-base
Finetuned
(48)
this model