Token Classification run_NER.py AttributeError

I noticed that storing train and test dataset in csv/json and reloading it, not giving me the original dataset . In reloaded dataset, ner_tags feature isn’t instance of ClassLabel.
However, when I am saving the train/test dataset in arrow format and re-loading it. The reloaded dataset is same as original one with label feature being instance of ClassLabel.

I have modified the run_NER.py file to consume train/test/validation dataset in following way:

 if data_args.dataset_name is not None:
        # Downloading and loading a dataset from the hub.
        raw_datasets = load_dataset(
            data_args.dataset_name,
            data_args.dataset_config_name,
            cache_dir=model_args.cache_dir,
            use_auth_token=True if model_args.use_auth_token else None,
        )
        if "train" in raw_datasets:
            train_dataset = raw_datasets['train']
        if "test" in raw_datasets:
            test_dataset = raw_datasets['test']
        if "validation" in raw_datasets:
            validation_dataset = raw_datasets['validation']
    else:
        # data_files = {}
        # if data_args.train_file is not None:
        #     data_files["train"] = data_args.train_file
        # if data_args.validation_file is not None:
        #     data_files["validation"] = data_args.validation_file
        # if data_args.test_file is not None:
        #     data_files["test"] = data_args.test_file
        # extension = data_args.train_file.split(".")[-1]
        # raw_datasets = load_dataset(extension, data_files=data_files, cache_dir=model_args.cache_dir)

        train_dataset = load_from_disk(data_args.train_file)
        test_dataset = load_from_disk(data_args.test_file)
        if data_args.validation_file is not None:
            validation_dataset = load_from_disk(data_args.validation_file)

in rest of the code of run_NER.py, i just replaced raw_datasets with appropriate train_dataset/test_dataset/validation_dataset.

Note: If anyone knows how to club the two dataset loaded from arrow format files, please feel free to drop the solution… :slight_smile: Until then this hack works :wink:

-Thanks,
Dinesh