How to load my own BILOU/IOB labels for training?

shensmobile · December 16, 2021, 1:06am

Hi everyone,

I’m not sure if I just missed it in the documentation, but I’m looking to fine-tune a model with my own annotated data (out of Doccano). I’m comfortable manipulating the Doccano output into a format specific to what HuggingFace needs, but I’m not actually sure how to load my own data with the IOB labels. The “Fine tune with a custom dataset” section in the documentation doesn’t actually use a custom dataset (Section in question, it’s using one of the built-in examples. The HuggingFace Datasets documentation also doesn’t explain how to load NER labels, unless I’m missing something (which I probably am).

If someone has an example of how to format data labels and how to use load_datasets() to create an NER dataset, I’d really appreciate the help! Thanks everyone!

BillDin · January 10, 2022, 8:31pm

I was having the same problem and this helped. Basically you still need to create your own data loader.
Based on what they described in their documents I thought the Dataset library could automatically identify and load common data formats, guess I was wrong…

Topic		Replies	Views
DocBank dataset for fine-tuning huggingface pre-trained model 🤗Datasets	1	821	March 4, 2022
Loading custom audio dataset and fine-tuning model Beginners	6	3235	December 12, 2023
Can’t generate my own dataset using load_dataset Beginners	1	171	May 7, 2024
Changing ClassLabels for NER Beginners	3	527	November 13, 2023
[NEWBY] Creating custom datasets to fine tune an existing model Beginners	0	300	November 4, 2022

How to load my own BILOU/IOB labels for training?

Related topics