How to deploy a huggingface model from S3 outside a Jupyter Notebook

I’ve successfully deployed my model from S3 in a jupyter notebook. Unfortunately my organization requires that all production AWS apps need to be 100% terraform. The guide here says “you can also instantiate Hugging Face endpoints with lower-level SDK such as boto3 and AWS CLI , Terraform and with CloudFormation templates.”

So I’m fairly sure that it’s possible to do this, but I can’t find any documentation anywhere regarding terraform deployment.

Hey @sdegrace,

We have an example for AWS CDK already: cdk-samples/sagemaker-endpoint-huggingface at master · philschmid/cdk-samples · GitHub, CDK is similar to Terraform.

But if your company wants to use terraform you can easily create it.
For a successful deployment of a SageMaker endpoint you need:

  1. a SageMaker model: Terraform documentation
  2. a Endpoint Configuration: Terraform documentation
  3. a SageMaker Endpoint: Terraform documentation

below you can find “pseudo” code of how this is going to look like:

resource "aws_sagemaker_model" "huggingface" {
  name               = "bert"
  execution_role_arn = "arn:aws:iam::111111111111:role/service-role/AmazonSageMaker-ExecutionRole-20200101T000001"

  primary_container {
    # CPU Image
    image="763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference:1.9.1-transformers4.12.3-cpu-py38-ubuntu20.04" 
# GPU Image  image = "763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference:1.9.1-transformers4.12.3-gpu-py38-cu111-ubuntu20.04"
   model_data_url="s3://your-model"
  }
}

resource "aws_sagemaker_endpoint_configuration" "huggingface" {
  name = "bert"

  production_variants {
    variant_name           = "variant-1"
    model_name             = aws_sagemaker_model.huggingface.name
    initial_instance_count = 1
    instance_type          = "ml.t2.medium"
  }
}

resource "aws_sagemaker_endpoint" "huggingface" {
  name                 = "bert"
  endpoint_config_name = aws_sagemaker_endpoint_configuration.huggingface.name
}

Hey thanks for the jumping off point! I’ve taken your example and it’s able to be deployed, but when I try to use it, I can’t seem to get HF_task to work properly. I’ve created my model with the following TF:

resource "aws_sagemaker_model" "huggingface" {
  name               = "bertModel"
  execution_role_arn = "<MY_ARN>"

  primary_container {
    # CPU Image
    image="763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.9.1-transformers4.12.3-cpu-py38-ubuntu20.04"
    model_data_url="s3://model_bucket/model.tar.gz"

    environment = {
        HF_TASK          = "feature-extraction"
        HF_MODEL_ID      = "sentence-transformers/msmarco-distilbert-base-v3"
        SAGEMAKER_REGION = "us-east-1"
        SAGEMAKER_CONTAINER_LOG_LEVEL = 20
      }
  }
}

But when I try to use that deployed endpoint, I get:

[ERROR] ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
    "code": 400,
    "type": "InternalServerException",
    "message": "Task couldn't be inferenced from BertModel.Inference Toolkit can only inference tasks from architectures ending with ['TapasForQuestionAnswering', 'ForQuestionAnswering', 'ForTokenClassification', 'ForSequenceClassification', 'ForMultipleChoice', 'ForMaskedLM', 'ForCausalLM', 'ForConditionalGeneration', 'MTModel', 'EncoderDecoderModel', 'GPT2LMHeadModel', 'T5WithLMHeadModel'].Use env `HF_TASK` to define your task."
}
". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/tfDeployedModel in account ********* for more information.
Traceback (most recent call last):
  File "/var/task/postProcessTransformer.py", line 22, in lambda_handler
    response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
  File "/var/runtime/botocore/client.py", line 386, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/var/runtime/botocore/client.py", line 705, in _make_api_call
    raise error_class(parsed_response, operation_name)

This same lambda worked just fine with my notebook-created model, which was defined like:

hub = {
  'HF_MODEL_ID':'sentence-transformers/msmarco-distilbert-base-v3', # model_id from hf.co/models
  'HF_TASK':'feature-extraction' # NLP task you want to use for predictions
}
huggingface_model = HuggingFaceModel(
   model_data="https://model-bucket.s3.amazonaws.com/model.tar.gz",  # path to your trained sagemaker model
    env=hub,
   role=role, # iam role with permissions to create an Endpoint
   transformers_version="4.6.1", # transformers version used
   pytorch_version="1.7.1", # pytorch version used
   py_version='py36', # python version used
    image_uri ='763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.7.1-transformers4.6.1-gpu-py36-cu110-ubuntu18.04'
)

Obviously the python and transformers versions are different, but I’ve also tried making them match :confused:

Can you confirm that terraform has set both environment variables? it looks the second one wasn’t set.


Ah i think you need to provide the env for HF_TASK as well in the terraform example. there is no env var defined.

Below is what I see in the console. I can’t see any difference between the one I’ve created using a jupyter notebook and the one created with tf.

Not certain I understand what you mean by providing the env for HF_TASK. Do you mean you think I should add a new env variable under my “huggingface” model in the tf?

Is this the screenshot of the endpoint deployed with sagemaker-sdk or with terraform?

Beware that when you set HF_MODEL_ID and provide a model_data the model_data is ignored.

Not certain I understand what you mean by providing the env for HF_TASK . Do you mean you think I should add a new env variable under my “huggingface” model in the tf?

You need do add HF_TASK as environment in the terraform aws_sagemaker_model Terraform Registry

The following ended up being the solution:

resource "aws_sagemaker_model" "huggingface" {
  name               = "bertModel"
  execution_role_arn = "arn:aws:iam::***************:role/apps/my-sagemaker-role"


  primary_container {
    # CPU Image
    image="763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.7.1-transformers4.6.1-gpu-py36-cu110-ubuntu18.04"
    model_data_url="s3://my-s3-bucket/model.tar.gz"

    environment = {
      HF_TASK          = "feature-extraction"
    }
  }
}

resource "aws_sagemaker_endpoint_configuration" "huggingface" {
  name = "bertEndpointConfig"

  production_variants {
    variant_name           = "variant-1"
    model_name             = aws_sagemaker_model.huggingface.name
    initial_instance_count = 1
    instance_type          = "ml.t2.medium"
  }
}

resource "aws_sagemaker_endpoint" "huggingface" {
  name                 = "tfDeployedModel"
  endpoint_config_name = aws_sagemaker_endpoint_configuration.huggingface.name
}

The big issue came from a newer image uri which was causing an exception in my custom post processing code.

1 Like