Deploying a conversational pipeline on AWS

I am following the instructions for deploying a custom Microsoft/DialoGPT-medium on AWS Sagemaker. As a first step, I am using the instructions included on the model hub, but they do not work and appear to be out of date.

The code that follows leads to this error:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "ConversationalPipeline expects a Conversation or list of Conversations as an input"
}

Code from the documentation:

from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'microsoft/DialoGPT-medium',
	'HF_TASK':'conversational'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.6.1',
	pytorch_version='1.7.1',
	py_version='py36',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.m5.xlarge' # ec2 instance type
)

predictor.predict({
	'inputs': {
		"past_user_inputs": ["Which movie is the best ?"],
		"generated_responses": ["It's Die Hard for sure."],
		"text": "Can you explain why ?",
	}
})

I can make those lists into Conversation objects, but even still I get an error. I am not even sure they will be json serializable anymore.

The larger issue is that I want to use a finetuned Microsoft/DialoGPT-medium that I trained on my local machine. I am following a Hugging Face youtube tutorial for that one, but once again the code presented in the video does not work out of the box.

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data="s3://xxxxxxxxxxxxxxx/model.tar.gz",  # path to your trained SageMaker model
   role=role,                                            # IAM role with permissions to create an endpoint
   transformers_version="4.6",                           # Transformers version used
   pytorch_version="1.7",                                # PyTorch version used
   py_version='py36',                                    # Python version used
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.m5.xlarge"
)

This fails with the following error:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "(\"You need to define one of the following [\u0027feature-extraction\u0027, \u0027text-classification\u0027, \u0027token-classification\u0027, \u0027question-answering\u0027, \u0027table-question-answering\u0027, \u0027fill-mask\u0027, \u0027summarization\u0027, \u0027translation\u0027, \u0027text2text-generation\u0027, \u0027text-generation\u0027, \u0027zero-shot-classification\u0027, \u0027conversational\u0027, \u0027image-classification\u0027] as env \u0027TASK\u0027.\", 403)"

This is my fourth time deploying a Hugging Face model to AWS Sagemaker, and the process is so incredibly complex and non-intuitive. I feel like the entire process is held up by toothpicks. Thanks for any light you can shed.

I think you forgot to add the env task when creating your HuggingFaceModel:

huggingface_model = HuggingFaceModel(
   model_data="s3://xxxxxxxxxxxxxxx/model.tar.gz",  # path to your trained SageMaker model
   role=role,                                            # IAM role with permissions to create an endpoint
   transformers_version="4.6",                           # Transformers version used
   pytorch_version="1.7",                                # PyTorch version used
   py_version='py36',                                    # Python version used
   env={ 'HF_TASK':'conversational' },
)

Hey @JB2022,

we added support for conversational pipeline with a later release. Can you use instead of transformers_version="4.6" => 4.12 and for pytorch_version="1.7" => 1.9.

You can find the whole list of available containers here: Reference

Then your fist code snippet should work.

from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'microsoft/DialoGPT-medium',
	'HF_TASK':'conversational'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.12',
	pytorch_version='1.9',
	py_version='py36',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.m5.xlarge' # ec2 instance type
)

predictor.predict({
	'inputs': {
		"past_user_inputs": ["Which movie is the best ?"],
		"generated_responses": ["It's Die Hard for sure."],
		"text": "Can you explain why ?",
	}
})

@JB2022 I would LOVE to have a chat with you… you say you’ve done this four times?! Please tell me how? :pray: Any chance at all you could DM me on Twitter? My username / twitter page is on my Huggingface profile.