Any Model for NER on French

sergulaydore · September 14, 2020, 5:43pm

Hi all,

I have been looking for a model to run a NER task in French. I see there are Camembert and RobertA models for token classification but these models are not fine-tuned for any NER tasks. Any suggestions on this? If there is not any model, is there any French dataset tagged for NER?

Thank you,
Sergul

stefan-it · September 18, 2020, 9:49am

I just asked Pedro (https://github.com/pjox), maybe he knows some good NER datasets for French that are publicly available for fine-tuning.

If someone could give me access to FTB dataset (see CamemBERT paper), I could fine-tune a model + upload it to the model hub

stefan-it · September 18, 2020, 12:48pm

Alternatives would be to use “silver standard” datasets like WikiANN/Panx or WikiNER (that include French)

sergulaydore · September 18, 2020, 1:36pm

Thank you @stefan-it. If I use WikiNER, do you know if there is a good way to convert it to ConLL format?

stefan-it · September 18, 2020, 1:38pm

For the last time I worked with WikiNER I wrote an own script that converts the dataset into a CoNLL-like format - you can find it here:

github.com

stefan-it/fine-tuned-berts-seq/blob/master/scripts/preprocess_wikiner.py

import sys

filename = sys.argv[1]


def parse_file(filename: str):
    with open(filename, "rt") as f_p:
        for line in f_p:
            line = line.rstrip()

            if not line:
                continue

            # Convert
            # L'|DET:ART|O Afghanistan|NAM|I-LOC a|VER:pres|O pour|PRP|O codes|NOM|O :|PUN|O
            # into CoNLL-like format:
            # token ner
            print(
                "\n".join(f"{' '.join(word.split('|')[::2])}" for word in line.split())
            )

This file has been truncated. show original

sergulaydore · September 18, 2020, 1:41pm

Excellent! Thank you so much and please let me know if you upload a NER model in French

sergulaydore · September 18, 2020, 1:51pm

Oh one more thing, @stefan-it do you have an evaluation script (F1, accuracy, etc scores) for ConLL type?

sergulaydore · September 18, 2020, 10:13pm

Found it here https://github.com/huggingface/transformers/tree/master/examples/token-classification

Topic		Replies	Views
Token Classification as Pre-training task Models	0	287	September 20, 2022
Multilingual NER pretrained model fine tuning Models	0	324	December 9, 2023
Tutorial: Fine-tuning with custom datasets – sentiment, NER, and question answering 🤗Transformers	19	12834	February 12, 2024
How to deal with differences between CoNLL 2003 dataset tokenisation and BER tokeniser when fine tuning NER model? Intermediate	6	2721	November 23, 2021
Best model for multi-lingual NER 🤗Hub	0	862	January 19, 2022

Any Model for NER on French

Related topics