Hi all,
I am deploying openchat3.5 model on sagemaker. first I uploaded all the files into the S3 storage and manged access and deployed the model successfully. I am getting error when trying to ask questions with it. Anybody knows how to use “predict” or “invoke_endpoint” with my given model?
below is the code which describes the functionality and error:
hub2 = {
‘HF_TASK’: ‘text-generation’,
}
model_path = “s3://penchatbotmodel/model.tar.gz”
huggingface_model2 = HuggingFaceModel(
role=role,
env=hub2,
py_version=‘py36’,
transformers_version=‘4.6.1’,
pytorch_version=‘1.7.1’,
model_data=model_path,
)
predictor = huggingface_model2.deploy(
initial_instance_count=1,
instance_type=“ml.g5.2xlarge”,
endpoint_name=“ChatBotPoint2”,
)
prompt=“”“<|prompter|>How can i stay more active during winter? Give me 3 tips.<|endoftext|><|assistant|>”“”
hyperparameters for llm
payload = {
“inputs”: prompt,
“parameters”: {
“do_sample”: True,
“top_p”: 0.7,
“temperature”: 0.7,
“top_k”: 50,
“max_new_tokens”: 256,
# “repetition_penalty”: 1.03,
# “stop”: [“<|endoftext|>”]
}
}
predictor.predict(payload )
The Error:
[ModelError :](ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
“code”: 400,
“type”: “InternalServerException”,
“message”: “\u0027mistral\u0027”
})