How to load my own BILOU/IOB labels for training?

Hi everyone,

I’m not sure if I just missed it in the documentation, but I’m looking to fine-tune a model with my own annotated data (out of Doccano). I’m comfortable manipulating the Doccano output into a format specific to what HuggingFace needs, but I’m not actually sure how to load my own data with the IOB labels. The “Fine tune with a custom dataset” section in the documentation doesn’t actually use a custom dataset (Section in question, it’s using one of the built-in examples. The HuggingFace Datasets documentation also doesn’t explain how to load NER labels, unless I’m missing something (which I probably am).

If someone has an example of how to format data labels and how to use load_datasets() to create an NER dataset, I’d really appreciate the help! Thanks everyone!

1 Like

I was having the same problem and this helped. Basically you still need to create your own data loader.
Based on what they described in their documents I thought the Dataset library could automatically identify and load common data formats, guess I was wrong…