Changing ClassLabels for NER

2ndBestKiller · November 12, 2023, 11:36am

Hey!

Thanks so much for your answer!

Thats interesting. So I actually figured out a different way. The thing is, I suspected something like a missing “config” file, since this kind of meta data cant be stored in a JSONL. But when I did it the way shown in this tutorial right here:

for split, dataset in drug_dataset_clean.items():
** dataset.to_json(f"drug-reviews-{split}.jsonl")**

it didnt create an info file. However if I write the entire DatasetDictionary to disk with the to_disk method I get the arrow format files with this exact info file you are mentioning. Hooooowever, when I uploaded that to HF via the website, it couldnt read the data and was just showing rubbish (I uploaded the entire folder of the DatasetDictionary that was created.

so what I eventually ended up doing was creating the ClassLabels like this:

ner_class_labels = ClassLabel(num_classes = 3,names=[‘O’, ‘B-DRUG’, ‘I-DRUG’])

train = train.cast_column(“ner_tags”, Sequence(ner_class_labels))
test = test.cast_column(“ner_tags”, Sequence(ner_class_labels))
validation = validation.cast_column(“ner_tags”, Sequence(ner_class_labels))
//train, test and validation are my datasets
→ Sequence being the key word here! Because this is what I was missing the entire time.

dataset_dict = DatasetDict({‘train’: train, ‘validation’: validation, ‘test’: test})

and pushing it to the hub via dataset_dict.push_to_hub(“myHub”, token = “mytoken”) method.

This way it directly casted each int in the int list to a ClassLabel with the meta information. And since I pushed it to the hub directly as arrow files in stored the meta information correctly in the ClassLabel objects without needing the info.json file.

But its good to know it works with the info file as well. However, I was only able to create it with the DatasetDictionary.save_to_disk method. Can you create it even if you just save a single dataset to disk as jsonl?

But at any rate, thanks again!

Topic		Replies	Views
Sequence features - Class Label Cast_ 🤗Datasets	9	1317	July 4, 2023
ValueError: Field 'ner_tags' from the JSON data of type list<item: string> is not compatible with ClassLabel. Compatible types are int64 and string 🤗Datasets	7	862	March 25, 2022
ClassLabels when using push_to_hub 🤗Datasets	1	534	December 29, 2021
RoBERTa - Creating a feature of type ClassLabel Beginners	0	748	March 26, 2022
Token Classification run_NER.py AttributeError Models	1	892	July 8, 2022

Changing ClassLabels for NER

Related topics