I’m encountering this as well despite using the SDK as specified-- I’ve tried it with both serverless and hosted inference and can’t get passed it (error being permission denied on the deep learning container as noted above). Starting with a roberta-base model, so it isn’t that large in the grand scheme either. I’m using a custom inference.py script which comes from an estimator in the ml pipeline. It conforms to the docs in the inference-toolkit but wondering if that is part of this. As a note, this is all part of a Sagemaker ML project (hence you’ll see training_step below, which just points to an s3 path where the model.tar.gz is).
Here is my Model code for reference:
env = {
‘MMS_DEFAULT_WORKERS_PER_MODEL’:‘1’
}
model = HuggingFaceModel(
name = model_step_name,
image_uri="763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.10.2-transformers4.17.0-cpu-py38-ubuntu20.04",
transformers_version = "4.17",
pytorch_version = "1.6",
py_version="py38",
model_data = training_step.properties.ModelArtifacts.S3ModelArtifacts,
role = role,
sagemaker_session=sagemaker_session,
env = env
)
Hi,
Lately, I wanted to deploy nllb-200-1.3B model to SageMaker serverless following this notebook but I get
Failed. Reason: Ping failed due to insufficient memory..
I increased the memory to 6144 and then I faced this ERROR:
python: can't open file '/usr/local/bin/deep_learning_container.py': [Errno 13] Permission denied
Also here is the code:
from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker.serverless import ServerlessInferenceConfig
# Hub Model configuration.
hub = {
'HF_MODEL_ID':'facebook/nllb-200-1.3B',
'HF_TASK':'translation'
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
env=hub, # configuration for loading model from Hub
role=role, # iam role with permissions to create an Endpoint
transformers_version="4.17", # transformers version used
pytorch_version="1.10", # pytorch version used
py_version='py38', # python version used
)
# Specify MemorySizeInMB and MaxConcurrency in the serverless config object
serverless_config = ServerlessInferenceConfig(
memory_size_in_mb=6144, max_concurrency=10,
)
# deploy the endpoint endpoint
predictor = huggingface_model.deploy(
serverless_inference_config=serverless_config
)
can you help a little bit about this error and how to solve it
Thanks