Hello,
I am trying to follow the PyTorch Question Answering example. However, when running the run_qa.py
script using my own (Dutch machine-translated) SQuAD train and test files (JSON), I get the following error: pyarrow.lib.ArrowInvalid: cannot mix list and non-list, non-null values
.
I use the following hyperparameters:
python run_qa.py \
--model_name_or_path GroNLP/bert-base-dutch-cased \
--version_2_with_negative \
--do_train \
--do_eval \
--train_file "C:\Users\myname\data\squad\nl_squad_train_clean.json" \
--test_file "C:\Users\myname\data\squad\nl_squad_dev_clean.json" \
--per_device_train_batch_size 12 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--save_steps=800 \
--output_dir ../output
When replacing the train and test by --dataset_name squad
it works fine. What could be the problem with my own SQuAD files?
Thanks in advance! Cheers!