Hey thanks for the jumping off point! I’ve taken your example and it’s able to be deployed, but when I try to use it, I can’t seem to get HF_task to work properly. I’ve created my model with the following TF:
resource "aws_sagemaker_model" "huggingface" {
name = "bertModel"
execution_role_arn = "<MY_ARN>"
primary_container {
# CPU Image
image="763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.9.1-transformers4.12.3-cpu-py38-ubuntu20.04"
model_data_url="s3://model_bucket/model.tar.gz"
environment = {
HF_TASK = "feature-extraction"
HF_MODEL_ID = "sentence-transformers/msmarco-distilbert-base-v3"
SAGEMAKER_REGION = "us-east-1"
SAGEMAKER_CONTAINER_LOG_LEVEL = 20
}
}
}
But when I try to use that deployed endpoint, I get:
[ERROR] ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "Task couldn't be inferenced from BertModel.Inference Toolkit can only inference tasks from architectures ending with ['TapasForQuestionAnswering', 'ForQuestionAnswering', 'ForTokenClassification', 'ForSequenceClassification', 'ForMultipleChoice', 'ForMaskedLM', 'ForCausalLM', 'ForConditionalGeneration', 'MTModel', 'EncoderDecoderModel', 'GPT2LMHeadModel', 'T5WithLMHeadModel'].Use env `HF_TASK` to define your task."
}
". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/tfDeployedModel in account ********* for more information.
Traceback (most recent call last):
File "/var/task/postProcessTransformer.py", line 22, in lambda_handler
response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
File "/var/runtime/botocore/client.py", line 386, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/var/runtime/botocore/client.py", line 705, in _make_api_call
raise error_class(parsed_response, operation_name)
This same lambda worked just fine with my notebook-created model, which was defined like:
hub = {
'HF_MODEL_ID':'sentence-transformers/msmarco-distilbert-base-v3', # model_id from hf.co/models
'HF_TASK':'feature-extraction' # NLP task you want to use for predictions
}
huggingface_model = HuggingFaceModel(
model_data="https://model-bucket.s3.amazonaws.com/model.tar.gz", # path to your trained sagemaker model
env=hub,
role=role, # iam role with permissions to create an Endpoint
transformers_version="4.6.1", # transformers version used
pytorch_version="1.7.1", # pytorch version used
py_version='py36', # python version used
image_uri ='763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.7.1-transformers4.6.1-gpu-py36-cu110-ubuntu18.04'
)
Obviously the python and transformers versions are different, but I’ve also tried making them match 