XML RoBERTa Multilanguage NER with OntoNotes 5 dataset

Constantin · March 8, 2021, 10:01pm

Hi

I would like to fine-tune XML RoBERTa for multilanguage NER with OntoNotes 5 dataset, but I really can’t understand how to do that. Honestly, I read the paper and I know the theory behind this process, but I can’t understand how to that with transformers module! I did not find any relevant example for it!

For now, I have my ontonotes5 data in the following form:

('لكن', 'O'),

(‘وزارة الداخلية الباكستانية’, ‘ORG’),
(‘وزارة’, ‘O’),
(‘الداخلية’, ‘O’),
(‘الباكستانية’, ‘O’),
(‘قالت’, ‘O’),
(‘ان’, ‘O’),
(‘11’, ‘CARDINAL’),
(‘11’, ‘O’),
(‘شخصا’, ‘O’),
(‘ً قتلوا’, ‘O’),

and this model: xlm-roberta-large · Hugging Face

lewtun · March 8, 2021, 10:27pm

Hi @Constantin, there’s a detailed tutorial here on using transformers for NER: https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/token_classification.ipynb#scrollTo=545PP3o8IrJV

I’ve been able to use it with XLM-R without problems. In your case, the main work will be loading your dataset into a datasets.Dataset object (recommended for fast processing!). For that see the docs here or look at how one of the NER datasets is implemented to understand how the features need to be defined, e.g. GermanNER

Topic		Replies	Views
Multilingual NER pretrained model fine tuning Models	0	324	December 9, 2023
RoBERTa fine-tuning on a dataset of short sentences and low cardinality 🤗Transformers	0	731	December 4, 2023
Fine-tuning XLM-RoBERTa for binary sentiment classification Beginners	1	1433	November 4, 2021
How to train a model for ner pipeline [RoBERTa] Beginners	0	604	July 2, 2021
Seeking Advice on Named Entity Recognition with AI Beginners	6	619	February 5, 2025

XML RoBERTa Multilanguage NER with OntoNotes 5 dataset

Related topics