KeyError: "length" - load_from_disk Training Model on AWS SageMaker

Hello everyone!

I was following the workshop by @philschmid -

MLOps - E2E

Why is not working anymore?

AlgorithmError: ExecuteUserScriptError: Command "/opt/conda/bin/python3.8 train.py --epochs 3 --eval_batch_size 64 --fp16 True --learning_rate 3e-05 --model_id distilbert-base-uncased --train_batch_size 32" Traceback (most recent call last): File "train.py", line 46, in <module> train_dataset = load_from_disk(args.training_dir) File "/opt/conda/lib/python3.8/site-packages/datasets/load.py", line 1165, in load_from_disk return Dataset.load_from_disk(dataset_path, fs, keep_in_memory=keep_in_memory) File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 744, in load_from_disk dataset_info = DatasetInfo.from_dict(json.load(dataset_info_file)) File "/opt/conda/lib/python3.8/site-packages/datasets/info.py", line 267, in from_dict return cls(**{k: v for k, v in dataset_info_dict.items() if k in field_names}) File "<string>", line 20, in __init__ File "/opt/conda/lib/python3.8/site-packages/datasets/info.py", line 143, in __post_init__ self.features = Features.fr

Even changing DataSet and updating transformer version to 4.17.0 and pytorch to 1.10.2:

AlgorithmError: ExecuteUserScriptError: ExitCode 1 ErrorMessage “KeyError: ‘length’” Command “/opt/conda/bin/python3.8 train.py --epochs 3 --eval_batch_size 64 --fp16 True --learning_rate 3e-05 --model_id distilbert-base-uncased --train_batch_size 32”, exit code: 1

@MrRobotV8 i think we can keep to conversation at the github issue level: KeyError Length during training following workshop MLOps · Issue #12 · philschmid/huggingface-sagemaker-workshop-series · GitHub

1 Like

Perfect! Thank you :slight_smile:

I have the same error and I think it is to do with my estimator struggling to locate my “train.py” file. Does anyone else find this a bit hit and miss? I’ve had it run successfully before and have no idea what to do.