Trying to deploy the PreTrained version of the below model to SageMaker Endpoint.
Model Name: mistralai/Mixtral-8x7B-Instruct-v0.1
Using the g5.48xlarge instance for the same.
But getting the below error:
File “/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py”, line 124, in serve_inner
model = get_model(model_id, revision, sharded, quantize, trust_remote_code)
File “/opt/conda/lib/python3.9/site-packages/text_generation_server/models/init.py”, line 314, in get_model
raise ValueError(f"Unsupported model type {model_type}")
ValueError: Unsupported model type mixtral.
Please find the below deployed commands for the same:
from sagemaker.huggingface import get_huggingface_llm_image_uri
retrieve the llm image uri
llm_image = get_huggingface_llm_image_uri(
“huggingface”,
version=“1.1.0”
)
print(f"llm image uri: {llm_image}")
import json
from sagemaker.huggingface import HuggingFaceModel
sagemaker config
instance_type = “ml.g5.48xlarge”
number_of_gpu = 1
health_check_timeout = 1000
Define Model and Endpoint configuration parameter
config = {
‘HF_MODEL_ID’: “mistralai/Mixtral-8x7B-v0.1”, # model_id from Models - Hugging Face
‘SM_NUM_GPUS’: json.dumps(number_of_gpu), # Number of GPU used per replica
‘MAX_INPUT_LENGTH’: json.dumps(1024), # Max length of input text
‘MAX_TOTAL_TOKENS’: json.dumps(2048), # Max length of the generation (including input text)
}
create HuggingFaceModel with the image uri
llm_model = HuggingFaceModel(
role=role,
env=config,
transformers_version = ‘4.28’, # the transformers version used in the training job
pytorch_version = ‘2.0’, # the pytorch_version version used in the training job
py_version = ‘py310’,
)
llm = llm_model.deploy(
initial_instance_count=1,
instance_type=instance_type,
container_startup_health_check_timeout=health_check_timeout,
endpoint_name = ‘mixtral-inference-testing1’
)
Please let me know if I am using the right versions or if I am doing anything wrong.
Let me know if any other information is required.
Thanks.