Sagemaker Serverless Inference

I’m encountering this as well despite using the SDK as specified-- I’ve tried it with both serverless and hosted inference and can’t get passed it (error being permission denied on the deep learning container as noted above). Starting with a roberta-base model, so it isn’t that large in the grand scheme either. I’m using a custom inference.py script which comes from an estimator in the ml pipeline. It conforms to the docs in the inference-toolkit but wondering if that is part of this. As a note, this is all part of a Sagemaker ML project (hence you’ll see training_step below, which just points to an s3 path where the model.tar.gz is).

Here is my Model code for reference:

env = {
‘MMS_DEFAULT_WORKERS_PER_MODEL’:‘1’
}

model = HuggingFaceModel(
    name = model_step_name,
    image_uri="763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.10.2-transformers4.17.0-cpu-py38-ubuntu20.04",
    transformers_version = "4.17",
    pytorch_version = "1.6",
    py_version="py38",
    model_data = training_step.properties.ModelArtifacts.S3ModelArtifacts,
    role = role,
    sagemaker_session=sagemaker_session,
    env = env
)

Hi,
Lately, I wanted to deploy nllb-200-1.3B model to SageMaker serverless following this notebook but I get

 Failed. Reason: Ping failed due to insufficient memory..

I increased the memory to 6144 and then I faced this ERROR:

python: can't open file '/usr/local/bin/deep_learning_container.py': [Errno 13] Permission denied

Also here is the code:

from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker.serverless import ServerlessInferenceConfig

# Hub Model configuration. 
hub = {
    'HF_MODEL_ID':'facebook/nllb-200-1.3B',
    'HF_TASK':'translation'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   env=hub,                      # configuration for loading model from Hub
   role=role,                    # iam role with permissions to create an Endpoint
   transformers_version="4.17",  # transformers version used
   pytorch_version="1.10",        # pytorch version used
   py_version='py38',            # python version used
)

# Specify MemorySizeInMB and MaxConcurrency in the serverless config object
serverless_config = ServerlessInferenceConfig(
    memory_size_in_mb=6144, max_concurrency=10,
)

# deploy the endpoint endpoint
predictor = huggingface_model.deploy(
    serverless_inference_config=serverless_config
)

can you help a little bit about this error and how to solve it
Thanks :slightly_smiling_face: