Deploying T5-style models via Sagemaker Endpoint: 'T5LayerFF' object has no attribute 'config'

Hello,

I was trying to deploy google/flan-t5-small, just as described in the following notebook: notebooks/deploy_transformer_model_from_hf_hub.ipynb at main · huggingface/notebooks · GitHub

When I deployed it, however, I ran into the following:

2022-10-28T10:30:02,085 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Prediction error
2022-10-28T10:30:02,087 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):
2022-10-28T10:30:02,087 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.8/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 219, in handle
2022-10-28T10:30:02,087 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     self.initialize(context)
2022-10-28T10:30:02,087 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.8/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 77, in initialize
2022-10-28T10:30:02,087 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     self.model = self.load(self.model_dir)
2022-10-28T10:30:02,088 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.8/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 104, in load
2022-10-28T10:30:02,088 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     hf_pipeline = get_pipeline(task=os.environ["HF_TASK"], model_dir=model_dir, device=self.device)
2022-10-28T10:30:02,088 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.8/site-packages/sagemaker_huggingface_inference_toolkit/transformers_utils.py", line 272, in get_pipeline
2022-10-28T10:30:02,088 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     hf_pipeline = pipeline(task=task, model=model_dir, device=device, **kwargs)
2022-10-28T10:30:02,088 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.8/site-packages/transformers/pipelines/__init__.py", line 549, in pipeline
2022-10-28T10:30:02,088 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     framework, model = infer_framework_load_model(
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.8/site-packages/transformers/pipelines/base.py", line 247, in infer_framework_load_model
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     model = model_class.from_pretrained(model, **kwargs)
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 447, in from_pretrained
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1493, in from_pretrained
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     model = cls(config, *model_args, **model_kwargs)
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 1473, in __init__
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     self.encoder = T5Stack(encoder_config, self.shared)
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 838, in __init__
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     [T5Block(config, has_relative_attention_bias=bool(i == 0)) for i in range(config.num_layers)]
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 838, in <listcomp>
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     [T5Block(config, has_relative_attention_bias=bool(i == 0)) for i in range(config.num_layers)]
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 631, in __init__
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     self.layer.append(T5LayerFF(config))
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 319, in __init__
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     f"{self.config.feed_forward_proj} is not supported. Choose between `relu` and `gated-gelu`"
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1177, in __getattr__
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     raise AttributeError("'
{}
' object has no attribute '
{}
'".format(
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - AttributeError: 'T5LayerFF' object has no attribute 'config'
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - During handling of the above exception, another exception occurred:
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):
2022-10-28T10:30:02,091 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.8/site-packages/mms/service.py", line 108, in predict
2022-10-28T10:30:02,091 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     ret = self._entry_point(input_batch, self.context)
2022-10-28T10:30:02,091 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.8/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 243, in handle
2022-10-28T10:30:02,091 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     raise PredictionException(str(e), 400)
2022-10-28T10:30:02,091 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - mms.service.PredictionException: 'T5LayerFF' object has no attribute 'config' : 400

The code in question seems to be located at the following link: transformers/modeling_t5.py at v4.17.0 · huggingface/transformers · GitHub

It has also changed quite a bit since the version 4.17.0 which is what the latest Deep Learning Container uses. I was therefore wondering whether this means that the T5 models would have trouble working with the HuggingFace Hub when deployed via Sagemaker Endpoints? I did not see the same issues in the latest version (4.23.1) but it seems that in order to install this version one would need to provide a custom inference.py script along with the model weights. Could you please confirm that’s the case and/or let me know if there is a way of getting the last version of transformers to work with the Sagemaker-provided Deep Learning Containers without providing the model weights and a custom inference.py script?

Thanks!


Setup:

  • transformers==4.17.0
  • torch==1.10.2
  • Python version: py38

Deployment script:

from sagemaker.huggingface import HuggingFaceModel
import sagemaker 

role = sagemaker.get_execution_role()

# Hub Model configuration. https://huggingface.co/models
hub = {
  'HF_MODEL_ID':'google/flan-t5-small', # model_id from hf.co/models
  'HF_TASK':'text2text-generation' # NLP task you want to use for predictions
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   env=hub,
   role=role, # iam role with permissions to create an Endpoint
   transformers_version="4.17.0", # transformers version used
   pytorch_version="1.10.2", # pytorch version used
   py_version="py38", # python version of the DLC
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.m5.xlarge"
)

To use the latest version of the transformers library in a SageMaker DLC you don’t have to provide a custome inference script, just a requirements.txt file with a line that says transformers==4.23.1

The DLC will then install/update to the specified transformers version. Not entirely sure if that will solve your issue but it’s alway good to at least try it out with the latest version :slight_smile:

Thanks @marshmellow77!

Would you happen to know where would one need to place this requirements.txt file? I do not seem to be able to find a configuration option in the HuggingFaceModel that would allow me to pass its location on its instantiation or deployment but I might have very easily missed something :slight_smile:

Ah, I see, because you’re deploying directly from the hub via the env parameter … hmm …

I usually download the model locally and put the requirements.txt file into the model directory, like it is done in this example (and as described in this documentation).

But the HuggingFaceModel class also has an entry_point parameter and I’m wondering if it can be used in conjuction with the env parameter. You could try it out and if it doesn’t work you could still fall back to putting the requirements.txt file directly into the model directory as in the example I mentioned.

Hope that helps!

1 Like

Thanks @marshmellow77!

I am afraid this didn’t work for me and so I fell back to putting requirements.txt to the model directory. I just wanted to avoid that, as the model is quite large (39GB) and basically comes from the HF hub directly – all I was adding was that requirements.txt.

Thanks again for your help!

@philschmid just for the sake of completeness, is there a chance I’ve missed anything? Is it possible to use entry_point in conjunctions with the environment variables that are set via the env parameter?

Thanks!