Deploying a conversational pipeline on AWS

I am following the instructions for deploying a custom Microsoft/DialoGPT-medium on AWS Sagemaker. As a first step, I am using the instructions included on the model hub, but they do not work and appear to be out of date.

The code that follows leads to this error:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "ConversationalPipeline expects a Conversation or list of Conversations as an input"
}

Code from the documentation:

from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'microsoft/DialoGPT-medium',
	'HF_TASK':'conversational'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.6.1',
	pytorch_version='1.7.1',
	py_version='py36',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.m5.xlarge' # ec2 instance type
)

predictor.predict({
	'inputs': {
		"past_user_inputs": ["Which movie is the best ?"],
		"generated_responses": ["It's Die Hard for sure."],
		"text": "Can you explain why ?",
	}
})

I can make those lists into Conversation objects, but even still I get an error. I am not even sure they will be json serializable anymore.

The larger issue is that I want to use a finetuned Microsoft/DialoGPT-medium that I trained on my local machine. I am following a Hugging Face youtube tutorial for that one, but once again the code presented in the video does not work out of the box.

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data="s3://xxxxxxxxxxxxxxx/model.tar.gz",  # path to your trained SageMaker model
   role=role,                                            # IAM role with permissions to create an endpoint
   transformers_version="4.6",                           # Transformers version used
   pytorch_version="1.7",                                # PyTorch version used
   py_version='py36',                                    # Python version used
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.m5.xlarge"
)

This fails with the following error:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "(\"You need to define one of the following [\u0027feature-extraction\u0027, \u0027text-classification\u0027, \u0027token-classification\u0027, \u0027question-answering\u0027, \u0027table-question-answering\u0027, \u0027fill-mask\u0027, \u0027summarization\u0027, \u0027translation\u0027, \u0027text2text-generation\u0027, \u0027text-generation\u0027, \u0027zero-shot-classification\u0027, \u0027conversational\u0027, \u0027image-classification\u0027] as env \u0027TASK\u0027.\", 403)"

This is my fourth time deploying a Hugging Face model to AWS Sagemaker, and the process is so incredibly complex and non-intuitive. I feel like the entire process is held up by toothpicks. Thanks for any light you can shed.

I think you forgot to add the env task when creating your HuggingFaceModel:

huggingface_model = HuggingFaceModel(
   model_data="s3://xxxxxxxxxxxxxxx/model.tar.gz",  # path to your trained SageMaker model
   role=role,                                            # IAM role with permissions to create an endpoint
   transformers_version="4.6",                           # Transformers version used
   pytorch_version="1.7",                                # PyTorch version used
   py_version='py36',                                    # Python version used
   env={ 'HF_TASK':'conversational' },
)

Hey @JB2022,

we added support for conversational pipeline with a later release. Can you use instead of transformers_version="4.6" => 4.12 and for pytorch_version="1.7" => 1.9.

You can find the whole list of available containers here: Reference

Then your fist code snippet should work.

from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'microsoft/DialoGPT-medium',
	'HF_TASK':'conversational'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.12',
	pytorch_version='1.9',
	py_version='py36',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.m5.xlarge' # ec2 instance type
)

predictor.predict({
	'inputs': {
		"past_user_inputs": ["Which movie is the best ?"],
		"generated_responses": ["It's Die Hard for sure."],
		"text": "Can you explain why ?",
	}
})

@JB2022 I would LOVE to have a chat with you… you say you’ve done this four times?! Please tell me how? :pray: Any chance at all you could DM me on Twitter? My username / twitter page is on my Huggingface profile.

I also faced the same issue today, and tried @CyranoB idea and decided to add the env variable.
hub = {
‘HF_TASK’:‘text-classification’
}
Then attach this value to the HuggingFaceModel class as below
huggingface_model = HuggingFaceModel(
model_data=“s3://xxxxxxxxxxxxxxx/model.tar.gz”,
role=role,
transformers_version=“4.6”,
pytorch_version=“1.7”,
py_version=‘py36’,
env=hub,
)

This did not work… So I tried the next option. Where I decided to provide the model_id too

hub = {
‘HF_MODEL_ID’:‘distilbert-base-uncased-finetuned-sst-2-english’,
‘HF_TASK’:‘text-classification’
}
Then deployed. After that the predict method worked.
Ps: I have not reviewed the underlying code of predict method.

Hello @Kamaljp could you share how you created the model.tar.gz? Also you normally don’t need to add the HF_TASK when deploying a model.tar.gz.
You should also update the versions you use to the latest ones, which are 4.26.0, 1.13.1, py39

hi @philschmid Is there an easy way to add streaming to conversational endpoints using a bit of customizations? Streaming is quite handy as the models are now larger (Flan-t5 , etc), for this SSE probably is required. I saw this (GitHub - huggingface/text-generation-inference: Large Language Model Text Generation Inference) but it seems to be an over kill. Any advice is really appreciated.

Cheers

The SageMaker platform currently doesn’t support SSE (Server Sent Events), which is needed for streaming.

hi @philschmid! I see you mentioned that SSE aren’t currently supported by SageMaker Endpoints, however, I was just wondering if this has changed now with the LLM inference container you released for aws (Introducing the Hugging Face LLM Inference Container for Amazon SageMaker).

Currently looking into options for hosting LLMs with streaming enabled so any inputs are welcome.

Cheers

1 Like

Inference endpoints is supporting streaming see: Deploy LLMs with Hugging Face Inference Endpoints