IOB tagging for NER / DatasetDict

R0m4ntic · June 5, 2023, 11:35am

Hello everyone!

I have created a custom dataset in the IOB tagging format. I want to follow the HF tutorial for Token Classification.

Sydney _ _ B-ORG
Airport _ _ I-ORG
announces _ _ O
range _ _ O
of _ _ O
new _ _ O
sustainability _ _ O
goals _ _ O

Finland _ _ B-LOCATION
is _ _ O
on _ _ O
track _ _ O
to _ _ O
be _ _ O
carbon _ _ B-SUSTAINABILITY
neutral _ _ I-SUSTAINABILITY
by _ _ O
2035 _ _ B-DATE

When I am loading my dataset with load_dataset(). I only got “text”… But not the “labels”.

DatasetDict({
train: Dataset({
features: [‘text’],
num_rows: 285996
})
})

When I look at the structure in the data loader it’s as follows.

‘Sydney _ _ B-ORG’,
‘Airport _ _ I-ORG’,
‘announces _ _ O’,
‘range _ _ O’,
‘of _ _ O’,
‘new _ _ O’,
‘sustainability _ _ O’,
‘goals _ _ O’,
‘’,
‘’,

Did I miss something?
How can I create a similar structure of the WNUT 17 dataset (“id”, “ner_tags”, and “tokens” ) in the tutorial?
Thank you for your help.

Roman

Noura123 · November 20, 2023, 9:23am

Did you find a solution for that? I’m having the same trouble as well.

Topic		Replies	Views
Changing ClassLabels for NER Beginners	3	528	November 13, 2023
Fine-tuning Token Classification with custom entities: "UndefinedMetricWarning: Precision and F-score are ill-defined" Beginners	1	1159	August 15, 2023
NER model fine tuning with labeled spans Beginners	5	3913	May 7, 2023
How to load my own BILOU/IOB labels for training? Beginners	1	1552	January 10, 2022
KeyError: 'loss' even though my dataset has labels Beginners	1	982	November 10, 2021

IOB tagging for NER / DatasetDict

Related topics