Ask for help with prediction results of Named Entity Recognition Task

lacle · May 17, 2021, 3:29pm

I did get your point, in the case of using PhoBERT, it, unfortunately, does not have a tokenizer in a fast vers.

Therefore, I manually write a function of doing the thing mentioned

One way to handle this is to only train on the tag labels for the first subtoken of a split token. We can do this in Transformers by setting the labels we wish to ignore to -100 . In the example above, if the label for @HuggingFace is 3 (indexing B-corporation ), we would set the labels of ['@', 'hugging', '##face'] to [3, -100, -100] .

 for i, label in tqdm(enumerate(examples["labels"]),total=len(examples["labels"])):
        steps=[] 
        batch=0
        for index,value in enumerate(examples['token'][i]):
            len_to_compare=len(tokenizer.tokenize(value))
            if len_to_compare>1:
                steps+=(list(range(index+batch+1,index+batch+len_to_compare)))
                batch+=(len_to_compare-1)

I just easily store the array of indexes that should be ignored by the above function, however, my result did get worse.

Screen Shot 2021-05-17 at 22.29.09

Topic		Replies	Views
How to handle <s> and </s> tags for custom NER using RoBERTa? Beginners	0	734	May 19, 2022
How to fine tune bert on entity recognition? Beginners	23	7491	November 21, 2022
Tokenization in a NER context 🤗Tokenizers	5	5846	August 11, 2021
[HELP] NER task single sentence/sample prediction 🤗Transformers	2	1411	August 25, 2021
Punctuation and Spaces in RoBERTa Tokenizer for NER with Pre-tokenized Data 🤗Transformers	0	602	January 16, 2022

Ask for help with prediction results of Named Entity Recognition Task

Related topics