Serverless deploy troubles

AlexG · May 15, 2022, 9:28am

Hi, I’m trying to deploy a serverless endpoint from model_data. Trying to do it in the same manner I deployed a similar model to an EC2 instance, but it seems to fail.

I do
huggingface_model = HuggingFaceModel(**model_params)
where
model_params = {‘role’: <exec_role>, ‘transformers_version’: ‘4.6’, ‘sagemaker_session’: <sagemaker.session.Session object at 0x158528e50>, ‘pytorch_version’: ‘1.7’, ‘py_version’: ‘py36’, ‘model_data’: <path_to_S3>}

then
serverless_config = ServerlessInferenceConfig(
memory_size_in_mb=memory_size_in_mb, max_concurrency=max_concurrency
)
huggingface_model.deploy(
serverless_inference_config=serverless_config, endpoint_name=model_name, wait=wait)

All seems to deploy well, then when I run, i’m getting:
“(“You need to define one of the following [\u0027feature-extraction\u0027, \u0027text-classification\u0027, \u0027token-classification\u0027, \u0027question-answering\u0027, \u0027table-question-answering\u0027, \u0027fill-mask\u0027, \u0027summarization\u0027, \u0027translation\u0027, \u0027text2text-generation\u0027, \u0027text-generation\u0027, \u0027zero-shot-classification\u0027, \u0027conversational\u0027, \u0027image-classification\u0027] as env \u0027TASK\u0027.”, 403)”

I tried adding env={“HF_TASK”: “feature-extraction”} to the model creation, but i then get an error (which makes sense, since i’m not really specifying a model from the hub)
“Can\u0027t load config for \u0027/.sagemaker/mms/models/model\u0027. Make sure that:\n\n- \u0027/.sagemaker/mms/models/model\u0027 is a correct model identifier listed on \u0027https://huggingface.co/models\u0027\n\n- or \u0027/.sagemaker/mms/models/model\u0027 is the correct path to a directory containing a config.json file\n\n”
}

Anyone has some idea that can help?

Thank you,
Alex

marshmellow77 · May 15, 2022, 7:08pm

Hi Alex

The way you instantiate the HuggingFaceModel class looks a bit unusual to me. I usually go about it this way:

huggingface_model = HuggingFaceModel(
   model_data="s3://hf-sagemaker-inference/model.tar.gz",  # path to your trained sagemaker model
   role=role, # iam role with permissions to create an Endpoint
   transformers_version="4.17", # transformers version used
   pytorch_version="1.10", # pytorch version used
   py_version="py38", # python version of the DLC
)

In terms of serverless deployment, you seem to do everything right, as far as I can tell. Just make sure you use the latest DLC (i.e. specify the latest supported Transformers and Pytorch versions, 4.17 and 1.10 respectively, in this case). You can find the latest versions here: Reference

You can also check out these two sample notebooks and mix and match to fit your use case:

Hope that helps!

Cheers
Heiko

AlexG · May 16, 2022, 6:28am

Thank you Heiko for the response and references. It’s really helpful. I tried the examples you shared, and what actually worked was both updating the library versions, and specifying the TASK in environment:

huggingface_model = HuggingFaceModel(
   model_data="s3://smbdata-development/models/MiniLM-L6-H384-uncased/model.tar.gz",  # path to your trained sagemaker model
   role=get_role(), # iam role with permissions to create an Endpoint
   sagemaker_session=session,
   transformers_version="4.17.0", # transformers version used
   pytorch_version="1.10.2", # pytorch version used
   env={"HF_TASK": "feature-extraction"},
   py_version="py38" # python version of the DLC
)

However, when I try doing the same for another model, where i’ve overridden the some functions, it doesn’t work:

huggingface_model = HuggingFaceModel(
   model_data="s3://smbdata-development/models/all-MiniLM-L6-v2/model.tar.gz",  # path to your trained sagemaker model
   role=get_role(), # iam role with permissions to create an Endpoint
   sagemaker_session=session,
   transformers_version="4.17.0", # transformers version used
   pytorch_version="1.10.2", # pytorch version used
   env={"HF_TASK": "feature-extraction"},
   py_version="py38" # python version of the DLC
)

gives me :message": “Can\u0027t load config for \u0027/.sagemaker/mms/models/model\u0027. If you were trying to load it from \u0027https://huggingface.co/models\u0027, make sure you don\u0027t have a local directory with the same name. Otherwise, make sure \u0027/.sagemaker/mms/models/model\u0027 is the correct path to a directory containing a config.json file”
}

philschmid · May 16, 2022, 8:33am

Hey @AlexG,

You most likely have some small issue in your model.tar.gz. You can follow this example on how to create a custom inference.py: notebooks/sagemaker-notebook.ipynb at main · huggingface/notebooks · GitHub

AlexG · May 16, 2022, 8:49am

@philschmid Thanks! I actually copied the exact implementation from your notebook for this test. Then placed it under “code” directory as you show…

AlexG · May 16, 2022, 11:49am

Got suspicious following @philschmid 's comment, so I ran his deployment code as is, and got the same error. Then changed the model to msmarco-distilbert-dot-v5 and used the same inference.py file to override the functions.
All deployed now and no errors. Possibly some issue with MiniLM configuraiton.

Thanks for the help everyone! The references were extremely helpful

Topic		Replies	Views
How to deploy a huggingface model from S3 outside a Jupyter Notebook Amazon SageMaker	6	3300	November 12, 2021
Getting error in the inference stage of Transformers Model (Hugging Face) 🤗Transformers	0	782	October 11, 2022
Deploying T5-style models via Sagemaker Endpoint: 'T5LayerFF' object has no attribute 'config' Amazon SageMaker	5	1464	November 7, 2022
InternalServerException from bart model created from s3 Amazon SageMaker	1	389	May 22, 2023
Need help deploying a HF model to AWS Sagemaker Amazon SageMaker	3	156	September 27, 2024

Serverless deploy troubles

Related topics