Hello everyone!
I have created a custom dataset in the IOB tagging format. I want to follow the HF tutorial for Token Classification.
Sydney _ _ B-ORG
Airport _ _ I-ORG
announces _ _ O
range _ _ O
of _ _ O
new _ _ O
sustainability _ _ O
goals _ _ O
Finland _ _ B-LOCATION
is _ _ O
on _ _ O
track _ _ O
to _ _ O
be _ _ O
carbon _ _ B-SUSTAINABILITY
neutral _ _ I-SUSTAINABILITY
by _ _ O
2035 _ _ B-DATE
When I am loading my dataset with load_dataset(). I only got “text”… But not the “labels”.
DatasetDict({
train: Dataset({
features: [‘text’],
num_rows: 285996
})
})
When I look at the structure in the data loader it’s as follows.
‘Sydney _ _ B-ORG’,
‘Airport _ _ I-ORG’,
‘announces _ _ O’,
‘range _ _ O’,
‘of _ _ O’,
‘new _ _ O’,
‘sustainability _ _ O’,
‘goals _ _ O’,
‘’,
‘’,
Did I miss something?
How can I create a similar structure of the WNUT 17 dataset (“id”, “ner_tags”, and “tokens” ) in the tutorial?
Thank you for your help.
Roman