An extra space appears before the entities recognised with RoBERTa fine-tuned for Token Classification

nicolauduran45 · November 8, 2023, 1:16pm

Hello,

I’m fine-tuning RoBERTa for Token Classification task,

here you can find an example of this:

nicolauduran45/affilgood-ner-test-v3 · Hugging Face.

When we use any text,

for instance,
text = “Department of Cardiology, University Hospital of Nice, Nice, France.”

Entities are identified well in the json_output, but there appear an extra space in front of all the entities

[
{
“entity_group”: “SUB”,
“score”: 0.9959414601325989,
“word”: " Department of Cardiology",
“start”: 0,
“end”: 24
},
{
“entity_group”: “ORG”,
“score”: 0.9965003728866577,
“word”: " University Hospital of Nice",
“start”: 26,
“end”: 53
},
{
“entity_group”: “CITY”,
“score”: 0.9671096801757812,
“word”: " Nice",
“start”: 55,
“end”: 59
},
{
“entity_group”: “COUNTRY”,
“score”: 0.9924795627593994,
“word”: " France",
“start”: 61,
“end”: 67
}
]

Furthermore, in the visual highlight in the inference API when text starts with an entity this is not painted in inference API, however this appears in the json_output.

Someone know where the error comes from? We assume it’s not painted because identifies the entity with an extra space before that doesn’t occur in the text. Or could this thing come from RoBERTa tokenizer? Because we wanted to use RoBERTa model, but when we used BERT this didn’t happen. How can we avoid this extra space before the recognised entity?

Many thanks!

Topic		Replies	Views
Unmasking adds an extra whitespace for BPE tokenizer 🤗Tokenizers	0	272	January 14, 2024
BPE tokenizers and spaces before words 🤗Transformers	4	26269	September 8, 2023
Punctuation and Spaces in RoBERTa Tokenizer for NER with Pre-tokenized Data 🤗Transformers	0	582	January 16, 2022
Added Tokens Not Decoding with Spaces 🤗Tokenizers	3	2840	January 19, 2024
BPEDecoder no spaces after special tokens Intermediate	4	2043	April 19, 2023

An extra space appears before the entities recognised with RoBERTa fine-tuned for Token Classification

Related topics