Deploying Huggingface Sagemaker Models with Elastic Inference

When I try to deploy a HuggingFace Sagemaker model with elastic inference (denoted by the accelerator_type parameter) I get an error.

Deploy Snippet:

predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.t2.medium",
    accelerator_type='ml.eia2.medium'
)

Error Msg:

~/miniconda3/envs/ner/lib/python3.8/site-packages/sagemaker/image_uris.py in _validate_arg(arg, available_options, arg_name)
    305     """Checks if the arg is in the available options, and raises a ``ValueError`` if not."""
    306     if arg not in available_options:
--> 307         raise ValueError(
    308             "Unsupported {arg_name}: {arg}. You may need to upgrade your SDK version "
    309             "(pip install -U sagemaker) for newer {arg_name}s. Supported {arg_name}(s): "

ValueError: Unsupported image scope: eia. You may need to upgrade your SDK version (pip install -U sagemaker) for newer image scopes. Supported image scope(s): training, inference.

The model deploys successfully if I do not provide an accelerator (i.e., no Elastic Inference).

Do the HuggingFace Sagemaker models support EI? If yes, how might I deploy the model successfully with EI? And if not, is EI support on the roadmap?

Much thanks in advance! :smile:

1 Like

Hey @schopra,

Sadly speaking we don’t have EI DLCs yet. We are working on it and it is on the roadmap with one of the highest priorities.
I would update this thread here when I got any news.

4 Likes

Is there by any chance a list of supported instances at this time? Thanks!

Hey @ujjirox,

supported instances for what? Training or Inference or both? You can find an overview of supported instances type for sagemaker here: Amazon SageMaker Pricing – Amazon Web Services (AWS)

Sorry! Should have been more clear. I meant for inference. I actually had tried running inference with ml.inf1.xlarge but it didn’t seem to work, hence the question.

Thanks.

Hey @ujjirox,

Inferentia is also not yet supported, since we need to create a separate DLC for the Inferentia instances, but we are on it.
Other than this every CPU / GPU machine should be supported.

1 Like

Hey, any news regarding the EI DLC/ INF DLC?

Hey @asafab,

Yes, i already opened a PR for INF DLCs:
You can follow it here: [huggingface_pytorch][NEURON][build] Huggingface Neuron inference DLC by philschmid · Pull Request #1578 · aws/deep-learning-containers · GitHub

When it is merged and available we will additionally share on social media + provide an example.

very cool!
thanks for the response, you have news about the EI DLC too?

1 Like

Hey there !!

The PR regarding INF DLCs seems to have been merged, does it mean ml.inf* instance family can now be used with HuggingFace models ?

They are merged but yet not released. I hope they will be available in the next 2 weeks. We ll let you know on social media.

1 Like

Hello @philschmid,

I read your article https://www.philschmid.de/huggingface-bert-aws-inferentia about hugging face model deployment on inferentia instance (very good and clear btw).

Can this method be used for all model types and all tasks ? I particularly think to Seq2Seq models (Bart, Pegasus, T5) for Summarization task

Hello @YannAgora,

Yes, it can also be used for T5 or pegasus. You can find more documentation here: Transformers MarianMT Tutorial — AWS Neuron documentation.
You can use the NeuronGeneration code inside the inference.py then.

1 Like

Hi @philschmid

Any updates on EI (Elastic Inference) DLCs for inference?
Can we start using EI accelerators with HuggingFace models for inference?

Thanks

What is the error you are seeing when you are trying to deploy an EI backed endpoint?

@philschmid As requested, please find the error details below.

ValueError: Unsupported image scope: eia. You may need to upgrade your SDK version (pip install -U sagemaker) for newer image scopes. Supported image scope(s): training, inference.

Sagemaker SDK version = 2.87.0

huggingface_model = HuggingFaceModel(
model_data=model_data, # path to your trained SageMaker model
role=get_execution_role(), # IAM role with permissions to create an endpoint
entry_point=‘deploy_ei.py’,
transformers_version=“4.12.3”, # Transformers version used
pytorch_version=“1.9.1”, # PyTorch version used
py_version=‘py38’, # Python version used
sagemaker_session=sagemaker_session
)

2 Likes

Hello @philschmid

Any updates on this issue and EI (Elastic Inference) DLCs for inference? Thanks

Let me reach out to the AWS team again. I ll report back here as soon as I hear something.

Out of curiosity why are you interested in EIA and not using Inferentia?

We are interested in cost effective solution and also interested in hosting multiple models in one container.
But I think we can not host multiple models in one container behind one endpoint with both elastic inference and Inferentia but it’s possible with only cpu based instances. Thanks

1 Like

Hey @philschmid

Any updates on this issue and EI (Elastic Inference) DLCs for inference?

Thanks