I’m facing the same issue with following lib versions:
datasets == 1.11.0
sagemaker == 2.48.1
UnexpectedStatusException: Error for Training job huggingface-pytorch-training-2021-08-19-12-34-40-568: Failed. Reason: AlgorithmError: ExecuteUserScriptError:
Command “/opt/conda/bin/python3.6 train.py --epochs 3 --model_name bert-base-uncased --train_batch_size 16”
Traceback (most recent call last):
File “train.py”, line 41, in
train_dataset = load_from_disk(args.training_dir)
File “/opt/conda/lib/python3.6/site-packages/datasets/load.py”, line 781, in load_from_disk
return Dataset.load_from_disk(dataset_path, fs)
File “/opt/conda/lib/python3.6/site-packages/datasets/arrow_dataset.py”, line 684, in load_from_disk
state = {k: state[k] for k in dataset.dict.keys()} # in case we add new fields
File “/opt/conda/lib/python3.6/site-packages/datasets/arrow_dataset.py”, line 684, in
state = {k: state[k] for k in dataset.dict.keys()} # in case we add new fields
KeyError: ‘_data’