How to modify loaded dataset

Hi,
I would like to ask how can I modify a loaded dataset to change the values of passages? and how can I check the details of the dataset? Currently, I am runing the example script from hugging face(transformers/run_squad.py at main · huggingface/transformers · GitHub).

Besides, when I run the code by: python run_squad.py --model_type bert --model_name_or_path bert-base-cased --do_train --do_eval --do_lower_case --train_file $SQUAD_DIR/train-v1.1.json --predict_file $SQUAD_DIR/dev-v1.1.json --per_gpu_train_batch_size 12 --learning_rate 3e-5 --num_train_epochs 2.0 --max_seq_length 384 --doc_stride 128 --output_dir /tmp/debug_squad/ .
I have got an error: FileNotFoundError: [Errno 2] No such file or directory: ‘/train-v1.1.json’.
How can I fix this problem?

Could someone please give me some ideas about how to operate the dataset. that would be much appreciated!!

Hi @yolo1 ! You can use Dataset.map function to change values of examples. And with load_dataset_builder("your_dataset_name").info you can check some metadata information about the dataset, as well as information about the features and sometimes dataset size.

1 Like