IOB tagging for NER / DatasetDict

Hello everyone!

I have created a custom dataset in the IOB tagging format. I want to follow the HF tutorial for Token Classification.

Sydney _ _ B-ORG
Airport _ _ I-ORG
announces _ _ O
range _ _ O
of _ _ O
new _ _ O
sustainability _ _ O
goals _ _ O

Finland _ _ B-LOCATION
is _ _ O
on _ _ O
track _ _ O
to _ _ O
be _ _ O
carbon _ _ B-SUSTAINABILITY
neutral _ _ I-SUSTAINABILITY
by _ _ O
2035 _ _ B-DATE

When I am loading my dataset with load_dataset(). I only got “text”… But not the “labels”.

DatasetDict({
train: Dataset({
features: [‘text’],
num_rows: 285996
})
})

When I look at the structure in the data loader it’s as follows.

‘Sydney _ _ B-ORG’,
‘Airport _ _ I-ORG’,
‘announces _ _ O’,
‘range _ _ O’,
‘of _ _ O’,
‘new _ _ O’,
‘sustainability _ _ O’,
‘goals _ _ O’,
‘’,
‘’,

Did I miss something?
How can I create a similar structure of the WNUT 17 dataset (“id”, “ner_tags”, and “tokens” ) in the tutorial?
Thank you for your help.

Roman

1 Like

Did you find a solution for that? I’m having the same trouble as well.