`flan-t5-xl` model does not appear to have a file named `pytorch_model.bin`

I am trying to fine-tune a flan-t5-xl model using run_summarization.py as the training script on Amazon SageMaker.

This is my main script:

from sagemaker.huggingface import HuggingFace

git_config = {'repo': 'https://github.com/huggingface/transformers.git','branch': 'v4.17.0'}

# hyperparameters, which are passed into the training job
hyperparameters={'per_device_train_batch_size': 8,
                 'per_device_eval_batch_size': 8,
                 'model_name_or_path': 'google/flan-t5-xl',
                 'dataset_name': 'samsum',
                 'do_train': True,
                 'do_eval': True,
                 'do_predict': True,
                 'predict_with_generate': True,
                 'output_dir': f'{output_location}/model',
                 'num_train_epochs': 1,
                 'learning_rate': 5e-5,
                 'seed': 7,
                 'fp16': True,
                 'max_source_length': 1153,
                 'max_target_length': 95,
                 'source_prefix': 'summarize: '

# create the Estimator
huggingface_estimator = HuggingFace(

# starting the train job

However, I get this error:

OSError: google/flan-t5-xl does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack."

It seems that the model is split into 2 chunks due to 10gb file size limit (pytorch_model-00001-of-00002.bin and pytorch_model-00002-of-00002.bin)

How could I approach this problem?

I have thought about downloading and merging the model files into a single pytorch_model.bin file and then specify the appropriate model path in ‘model_name_or_path’.
Would something like this work?:

cat pytorch_model-00001-of-00002.bin pytorch_model-00002-of-00002.bin > pytorch_model.bin

Or perhaps I can download the pytorch_model.bin file directly from somewhere?

The issue is here that you are using transformers==4.17.0, which is not having support for sharded models. To fix this you just need to create a requirements.txt in the ./examples/pytorch/summarization directory if it doesn’t exist yet and then add transformers==4.25.1 there.

Thanks a lot for the answer @philschmid

The directory is located at the transformers repository from hugginface
May I make a pull request in the v4.25-release branch and add transformers==4.25.1 ?

No this is not necessary. Since the scripts assumes the correct version. We are working on updating the container version that way no change is needed.

Cool!

Is there an approximate date for the update to be completed?

sometime in February. i ll let you know.

