Hi
Trying to deploy mistralai/Mistral-7B-Instruct-v0.1
model on AWS Sagemaker using the Hugging Face LLM DLC. Here is my code:
import json
from sagemaker.huggingface import HuggingFaceModel
from sagemaker.huggingface import get_huggingface_llm_image_uri
# retrieve the llm image uri
llm_image = get_huggingface_llm_image_uri(
"huggingface",
version="1.0.3"
)
# sagemaker config
instance_type = "ml.g5.2xlarge"
number_of_gpu = 1
health_check_timeout = 600
# Define Model and Endpoint configuration parameter
config = {
'HF_MODEL_ID': "mistralai/Mistral-7B-Instruct-v0.1",
'SM_NUM_GPUS': json.dumps(number_of_gpu),
'MAX_INPUT_LENGTH': json.dumps(2048),
'MAX_TOTAL_TOKENS': json.dumps(4096),
'MAX_BATCH_TOTAL_TOKENS': json.dumps(8192),
'HUGGING_FACE_HUB_TOKEN': "hf_****"
}
# create HuggingFaceModel with the image uri
llm_model = HuggingFaceModel(
role=role,
image_uri=llm_image,
env=config,
code_location=f"s3://{S3_BUCKET}/"
)
llm = llm_model.deploy(
initial_instance_count=1,
instance_type=instance_type,
endpoint_name="mistral-7b-instruct",
container_startup_health_check_timeout=health_check_timeout,
tags=[{"Key": "ENV", "Value": "dev"}]
)
The deployment fails with the following errors:
Download encountered an error: Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 195, in download_weights
utils.convert_files(local_pt_files, local_st_files, discard_names)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 106, in convert_files
convert_file(pt_file, sf_file, discard_names)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 68, in convert_file
to_removes = _remove_duplicate_names(loaded, discard_names=discard_names)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 32, in _remove_duplicate_names
raise RuntimeError(
RuntimeError: Error while trying to find names to remove to save state dict, but found no suitable name to keep for saving amongst: {'model.norm.weight'}. None is covering the entire storage.Refusing to save/load the model since you could be storing much more memory than needed. Please refer to https://huggingface.co/docs/safetensors/torch_shared_tensors for more information. Or open an issue.
Any idea how to solve this issue and make the deployment work? Thanks in advance for your help