Coreference Resolution, how?

Aqilaas · March 17, 2022, 4:31am

Hello, i have finetuned and trained huggingface models before. But i am very confused at my current task, which is coreference resolution.
Basically, i have a dataset of [Tokens, ListOfCoreferenceLabel] for each passage in a dataset consisting of about 2500 passage.
Tokens are words in a text, and ListOfCoreferenceLabel is a list filled with labels, labels is an array that might consists of more than 1 item of different categories (Nothing, Relations, Mentions)

Is token classification the right choice?
How to shape dataset so it can fit to training? By that, what i mean is, is there any specific standardized shape of data for coreference resolution training?
Relations are shaped like ‘IDENT[10_12]’ (Means IDENT relation between id 10 and id 12 where id 12 is the token that has this relation as label) and Mentions are like ‘PROPER[10]’ (This token is a PROPER mention and has been assigned id 10), the numbers are there to indicate their IDs, how do i properly represent the relations between different mentions in a training dataset? I welcome any way either pytorch or tensorflow or anything else
Any other important thing i might miss because my head is full with this 3 for now
Any tutorials?
Thanks in advance

Topic		Replies	Views
Coreference Resolution Beginners	2	3888	May 19, 2025
Tutorial: Fine-tuning with custom datasets – sentiment, NER, and question answering 🤗Transformers	19	12844	February 12, 2024
Seeking Guidance on Creating and Training a Model with a Specific Dataset Beginners	4	499	February 2, 2024
Multi-input tag and ,multi-label output for token classification using Bert pretrained model 🤗Transformers	1	86	January 9, 2025
Seeking Advice on Named Entity Recognition with AI Beginners	6	655	February 5, 2025

Coreference Resolution, how?

Related topics