Deploying T5-style models via Sagemaker Endpoint: 'T5LayerFF' object has no attribute 'config'


I was trying to deploy google/flan-t5-small, just as described in the following notebook: notebooks/deploy_transformer_model_from_hf_hub.ipynb at main · huggingface/notebooks · GitHub

When I deployed it, however, I ran into the following:

2022-10-28T10:30:02,085 [INFO ] W-google__flan-t5-small-31-stdout - Prediction error
2022-10-28T10:30:02,087 [INFO ] W-google__flan-t5-small-31-stdout - Traceback (most recent call last):
2022-10-28T10:30:02,087 [INFO ] W-google__flan-t5-small-31-stdout -   File "/opt/conda/lib/python3.8/site-packages/sagemaker_huggingface_inference_toolkit/", line 219, in handle
2022-10-28T10:30:02,087 [INFO ] W-google__flan-t5-small-31-stdout -     self.initialize(context)
2022-10-28T10:30:02,087 [INFO ] W-google__flan-t5-small-31-stdout -   File "/opt/conda/lib/python3.8/site-packages/sagemaker_huggingface_inference_toolkit/", line 77, in initialize
2022-10-28T10:30:02,087 [INFO ] W-google__flan-t5-small-31-stdout -     self.model = self.load(self.model_dir)
2022-10-28T10:30:02,088 [INFO ] W-google__flan-t5-small-31-stdout -   File "/opt/conda/lib/python3.8/site-packages/sagemaker_huggingface_inference_toolkit/", line 104, in load
2022-10-28T10:30:02,088 [INFO ] W-google__flan-t5-small-31-stdout -     hf_pipeline = get_pipeline(task=os.environ["HF_TASK"], model_dir=model_dir, device=self.device)
2022-10-28T10:30:02,088 [INFO ] W-google__flan-t5-small-31-stdout -   File "/opt/conda/lib/python3.8/site-packages/sagemaker_huggingface_inference_toolkit/", line 272, in get_pipeline
2022-10-28T10:30:02,088 [INFO ] W-google__flan-t5-small-31-stdout -     hf_pipeline = pipeline(task=task, model=model_dir, device=device, **kwargs)
2022-10-28T10:30:02,088 [INFO ] W-google__flan-t5-small-31-stdout -   File "/opt/conda/lib/python3.8/site-packages/transformers/pipelines/", line 549, in pipeline
2022-10-28T10:30:02,088 [INFO ] W-google__flan-t5-small-31-stdout -     framework, model = infer_framework_load_model(
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout -   File "/opt/conda/lib/python3.8/site-packages/transformers/pipelines/", line 247, in infer_framework_load_model
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout -     model = model_class.from_pretrained(model, **kwargs)
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout -   File "/opt/conda/lib/python3.8/site-packages/transformers/models/auto/", line 447, in from_pretrained
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout -     return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout -   File "/opt/conda/lib/python3.8/site-packages/transformers/", line 1493, in from_pretrained
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout -     model = cls(config, *model_args, **model_kwargs)
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout -   File "/opt/conda/lib/python3.8/site-packages/transformers/models/t5/", line 1473, in __init__
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout -     self.encoder = T5Stack(encoder_config, self.shared)
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout -   File "/opt/conda/lib/python3.8/site-packages/transformers/models/t5/", line 838, in __init__
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout -     [T5Block(config, has_relative_attention_bias=bool(i == 0)) for i in range(config.num_layers)]
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout -   File "/opt/conda/lib/python3.8/site-packages/transformers/models/t5/", line 838, in <listcomp>
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout -     [T5Block(config, has_relative_attention_bias=bool(i == 0)) for i in range(config.num_layers)]
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout -   File "/opt/conda/lib/python3.8/site-packages/transformers/models/t5/", line 631, in __init__
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout -     self.layer.append(T5LayerFF(config))
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout -   File "/opt/conda/lib/python3.8/site-packages/transformers/models/t5/", line 319, in __init__
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout -     f"{self.config.feed_forward_proj} is not supported. Choose between `relu` and `gated-gelu`"
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout -   File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/", line 1177, in __getattr__
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout -     raise AttributeError("'
' object has no attribute '
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout - AttributeError: 'T5LayerFF' object has no attribute 'config'
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout - 
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout - During handling of the above exception, another exception occurred:
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout - 
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout - Traceback (most recent call last):
2022-10-28T10:30:02,091 [INFO ] W-google__flan-t5-small-31-stdout -   File "/opt/conda/lib/python3.8/site-packages/mms/", line 108, in predict
2022-10-28T10:30:02,091 [INFO ] W-google__flan-t5-small-31-stdout -     ret = self._entry_point(input_batch, self.context)
2022-10-28T10:30:02,091 [INFO ] W-google__flan-t5-small-31-stdout -   File "/opt/conda/lib/python3.8/site-packages/sagemaker_huggingface_inference_toolkit/", line 243, in handle
2022-10-28T10:30:02,091 [INFO ] W-google__flan-t5-small-31-stdout -     raise PredictionException(str(e), 400)
2022-10-28T10:30:02,091 [INFO ] W-google__flan-t5-small-31-stdout - mms.service.PredictionException: 'T5LayerFF' object has no attribute 'config' : 400

The code in question seems to be located at the following link: transformers/ at v4.17.0 · huggingface/transformers · GitHub

It has also changed quite a bit since the version 4.17.0 which is what the latest Deep Learning Container uses. I was therefore wondering whether this means that the T5 models would have trouble working with the HuggingFace Hub when deployed via Sagemaker Endpoints? I did not see the same issues in the latest version (4.23.1) but it seems that in order to install this version one would need to provide a custom script along with the model weights. Could you please confirm that’s the case and/or let me know if there is a way of getting the last version of transformers to work with the Sagemaker-provided Deep Learning Containers without providing the model weights and a custom script?



  • transformers==4.17.0
  • torch==1.10.2
  • Python version: py38

Deployment script:

from sagemaker.huggingface import HuggingFaceModel
import sagemaker 

role = sagemaker.get_execution_role()

# Hub Model configuration.
hub = {
  'HF_MODEL_ID':'google/flan-t5-small', # model_id from
  'HF_TASK':'text2text-generation' # NLP task you want to use for predictions

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   role=role, # iam role with permissions to create an Endpoint
   transformers_version="4.17.0", # transformers version used
   pytorch_version="1.10.2", # pytorch version used
   py_version="py38", # python version of the DLC
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(

To use the latest version of the transformers library in a SageMaker DLC you don’t have to provide a custome inference script, just a requirements.txt file with a line that says transformers==4.23.1

The DLC will then install/update to the specified transformers version. Not entirely sure if that will solve your issue but it’s alway good to at least try it out with the latest version :slight_smile:

Thanks @marshmellow77!

Would you happen to know where would one need to place this requirements.txt file? I do not seem to be able to find a configuration option in the HuggingFaceModel that would allow me to pass its location on its instantiation or deployment but I might have very easily missed something :slight_smile:

Ah, I see, because you’re deploying directly from the hub via the env parameter … hmm …

I usually download the model locally and put the requirements.txt file into the model directory, like it is done in this example (and as described in this documentation).

But the HuggingFaceModel class also has an entry_point parameter and I’m wondering if it can be used in conjuction with the env parameter. You could try it out and if it doesn’t work you could still fall back to putting the requirements.txt file directly into the model directory as in the example I mentioned.

Hope that helps!

1 Like

Thanks @marshmellow77!

I am afraid this didn’t work for me and so I fell back to putting requirements.txt to the model directory. I just wanted to avoid that, as the model is quite large (39GB) and basically comes from the HF hub directly – all I was adding was that requirements.txt.

Thanks again for your help!

@philschmid just for the sake of completeness, is there a chance I’ve missed anything? Is it possible to use entry_point in conjunctions with the environment variables that are set via the env parameter?