Hello,
I was trying to deploy google/flan-t5-small
, just as described in the following notebook: notebooks/deploy_transformer_model_from_hf_hub.ipynb at main · huggingface/notebooks · GitHub
When I deployed it, however, I ran into the following:
2022-10-28T10:30:02,085 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Prediction error
2022-10-28T10:30:02,087 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):
2022-10-28T10:30:02,087 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.8/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 219, in handle
2022-10-28T10:30:02,087 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - self.initialize(context)
2022-10-28T10:30:02,087 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.8/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 77, in initialize
2022-10-28T10:30:02,087 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - self.model = self.load(self.model_dir)
2022-10-28T10:30:02,088 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.8/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 104, in load
2022-10-28T10:30:02,088 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - hf_pipeline = get_pipeline(task=os.environ["HF_TASK"], model_dir=model_dir, device=self.device)
2022-10-28T10:30:02,088 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.8/site-packages/sagemaker_huggingface_inference_toolkit/transformers_utils.py", line 272, in get_pipeline
2022-10-28T10:30:02,088 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - hf_pipeline = pipeline(task=task, model=model_dir, device=device, **kwargs)
2022-10-28T10:30:02,088 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.8/site-packages/transformers/pipelines/__init__.py", line 549, in pipeline
2022-10-28T10:30:02,088 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - framework, model = infer_framework_load_model(
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.8/site-packages/transformers/pipelines/base.py", line 247, in infer_framework_load_model
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model = model_class.from_pretrained(model, **kwargs)
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 447, in from_pretrained
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1493, in from_pretrained
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model = cls(config, *model_args, **model_kwargs)
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 1473, in __init__
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - self.encoder = T5Stack(encoder_config, self.shared)
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 838, in __init__
2022-10-28T10:30:02,089 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [T5Block(config, has_relative_attention_bias=bool(i == 0)) for i in range(config.num_layers)]
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 838, in <listcomp>
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [T5Block(config, has_relative_attention_bias=bool(i == 0)) for i in range(config.num_layers)]
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 631, in __init__
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - self.layer.append(T5LayerFF(config))
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 319, in __init__
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - f"{self.config.feed_forward_proj} is not supported. Choose between `relu` and `gated-gelu`"
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1177, in __getattr__
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - raise AttributeError("'
{}
' object has no attribute '
{}
'".format(
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - AttributeError: 'T5LayerFF' object has no attribute 'config'
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - During handling of the above exception, another exception occurred:
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -
2022-10-28T10:30:02,090 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):
2022-10-28T10:30:02,091 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.8/site-packages/mms/service.py", line 108, in predict
2022-10-28T10:30:02,091 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - ret = self._entry_point(input_batch, self.context)
2022-10-28T10:30:02,091 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.8/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 243, in handle
2022-10-28T10:30:02,091 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - raise PredictionException(str(e), 400)
2022-10-28T10:30:02,091 [INFO ] W-google__flan-t5-small-31-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - mms.service.PredictionException: 'T5LayerFF' object has no attribute 'config' : 400
The code in question seems to be located at the following link: transformers/modeling_t5.py at v4.17.0 · huggingface/transformers · GitHub
It has also changed quite a bit since the version 4.17.0 which is what the latest Deep Learning Container uses. I was therefore wondering whether this means that the T5 models would have trouble working with the HuggingFace Hub when deployed via Sagemaker Endpoints? I did not see the same issues in the latest version (4.23.1) but it seems that in order to install this version one would need to provide a custom inference.py
script along with the model weights. Could you please confirm that’s the case and/or let me know if there is a way of getting the last version of transformers
to work with the Sagemaker-provided Deep Learning Containers without providing the model weights and a custom inference.py
script?
Thanks!
Setup:
transformers==4.17.0
torch==1.10.2
- Python version:
py38
Deployment script:
from sagemaker.huggingface import HuggingFaceModel
import sagemaker
role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'google/flan-t5-small', # model_id from hf.co/models
'HF_TASK':'text2text-generation' # NLP task you want to use for predictions
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
env=hub,
role=role, # iam role with permissions to create an Endpoint
transformers_version="4.17.0", # transformers version used
pytorch_version="1.10.2", # pytorch version used
py_version="py38", # python version of the DLC
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.m5.xlarge"
)