~/miniconda3/envs/ner/lib/python3.8/site-packages/sagemaker/image_uris.py in _validate_arg(arg, available_options, arg_name)
305 """Checks if the arg is in the available options, and raises a ``ValueError`` if not."""
306 if arg not in available_options:
--> 307 raise ValueError(
308 "Unsupported {arg_name}: {arg}. You may need to upgrade your SDK version "
309 "(pip install -U sagemaker) for newer {arg_name}s. Supported {arg_name}(s): "
ValueError: Unsupported image scope: eia. You may need to upgrade your SDK version (pip install -U sagemaker) for newer image scopes. Supported image scope(s): training, inference.
The model deploys successfully if I do not provide an accelerator (i.e., no Elastic Inference).
Do the HuggingFace Sagemaker models support EI? If yes, how might I deploy the model successfully with EI? And if not, is EI support on the roadmap?
Sadly speaking we don’t have EI DLCs yet. We are working on it and it is on the roadmap with one of the highest priorities.
I would update this thread here when I got any news.
Sorry! Should have been more clear. I meant for inference. I actually had tried running inference with ml.inf1.xlarge but it didn’t seem to work, hence the question.
Inferentia is also not yet supported, since we need to create a separate DLC for the Inferentia instances, but we are on it.
Other than this every CPU / GPU machine should be supported.
@philschmid As requested, please find the error details below.
ValueError: Unsupported image scope: eia. You may need to upgrade your SDK version (pip install -U sagemaker) for newer image scopes. Supported image scope(s): training, inference.
Sagemaker SDK version = 2.87.0
huggingface_model = HuggingFaceModel(
model_data=model_data, # path to your trained SageMaker model
role=get_execution_role(), # IAM role with permissions to create an endpoint
entry_point=‘deploy_ei.py’,
transformers_version=“4.12.3”, # Transformers version used
pytorch_version=“1.9.1”, # PyTorch version used
py_version=‘py38’, # Python version used
sagemaker_session=sagemaker_session
)
We are interested in cost effective solution and also interested in hosting multiple models in one container.
But I think we can not host multiple models in one container behind one endpoint with both elastic inference and Inferentia but it’s possible with only cpu based instances. Thanks