I’ve deployed llama 3 70b on sagemaker and was able to invoke the llm after deploying from my sagemaker notebook. The problem is that when try to do inference and stream the response back in my python server I get a weird error:
ModelError: An error occurred (ModelError) when calling the InvokeEndpointWithResponseStream operation: Received client error (422) from primary with message “Failed to deserialize the JSON body into the target type: missing field model
at line 1 column 180”
Here is my code to invoke with streaming:
smr = boto3.client(
“sagemaker-runtime”, region_name=‘us-east-1’)
tokenizer = AutoTokenizer.from_pretrained(“meta-llama/Meta-Llama-3-70B-Instruct”, token=“hf-token”)
messages = [
{“role”: “system”, “content”: “You are a friendly AI Assistant”},
{“role”: “user”, “content”: “hi!”}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids(“<|eot_id|>”)
]
payload = {
“max_new_tokens”:512,
“eos_token_id”:terminators,
“do_sample”:True,
“temperature”:0.2,
"top_p”:0.6,
“return_full_text”: False,
}
body = {
“inputs”: prompt,
“parameters”: payload,
“stream”: True,
}
response = smr.invoke_endpoint_with_response_stream(EndpointName=“<my_enpoint>”, Body=json.dumps(body), ContentType=‘application/json’)
Am I missing fields in the payload?