"No such file or directory" when pushing to hub from sagemaker traning job

Hi,
I’m getting an error when trying to push my fine tuned model to the hub. Following the example notebook Spot instances in sagemaker I ran the training with these settings

  • In the notebook:
hyperparameters={
    ...
    'output_dir':'/opt/ml/checkpoints'
}
  • Inside train.py
training_args = TrainingArguments(
    output_dir=args.output_dir,
    overwrite_output_dir=True if get_last_checkpoint(args.output_dir) is not None else False,
    ...
}
# Finetune the model
if get_last_checkpoint(args.output_dir) is not None:
    logger.info("***** continue training *****")
    last_checkpoint = get_last_checkpoint(args.output_dir)
    trainer.train(resume_from_checkpoint=last_checkpoint)
else:
    logger.info("***** start training *****")
    trainer.train()

...

# Push to HuggingFace
kwargs = {
    ...
}

trainer.push_to_hub(**kwargs)

From the logs I can see that “model_dir”: “/opt/ml/model”
However when pushing the model to the hub I get an error

FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/checkpoints/model.safetensors.sagemaker-uploading'

I can see thes files in my bucket after the job is completed:

config.json
model.safetensors
preprocessor_config.json
README.md
training_args.bin

It looks like at some point ‘.sagemaker-uploading’ is appended, I don’t know what causes this behavior and how to disable it.