I get the same error here for CodeLlama-7b-hf-Instruct.
I even included a requirements.txt file in model.tar.gz requiring a transformers==4.33.2 but it doesnt work.
Any ideas?
I get the same error here for CodeLlama-7b-hf-Instruct.
I even included a requirements.txt file in model.tar.gz requiring a transformers==4.33.2 but it doesnt work.
Any ideas?
Do you mean transformers==4.33.2
?
@jeremydd yup I fined tuned llama-7b on my own data and deploy on Sagemaker.
Hello @Mit1208 ! I have tried to deploy the finetuned Llama-7b with my own data, with TGI v1.0 as you mentioned and it still gives me the error
“FileNotFoundError: No local weights found in /opt/ml/model with extension .bin”
Could you give me more details on how you managed to deploy it in Sagemaker? Thanks!
Hi @cnicu,
I was getting same error when I didn’t specified my model path. I used spot instance while training so my checkpoints were on the S3.
config = {
'HF_MODEL_ID': '/opt/ml/model', # path to where sagemaker stores the model
'SM_NUM_GPUS': json.dumps(number_of_gpu), # Number of GPU used per replica
'MAX_INPUT_LENGTH': json.dumps(1024), # Max length of input text
'MAX_TOTAL_TOKENS': json.dumps(2048), # Max length of the generation (including input text),
}
# create HuggingFaceModel
llm_model = HuggingFaceModel(
role=role,
image_uri=llm_image,
model_data=huggingface_estimator.model_data,
env=config
)
# Deploy model to an endpoint
llm = llm_model.deploy(
endpoint_name="llama-2-7b-finetuned",
initial_instance_count=1,
instance_type=instance_type,
container_startup_health_check_timeout=health_check_timeout,
)
Here model_data is URI of s3 checkpoint. Make sure you have a model at S3 URI path.
maybe try using this:
llm_image = get_huggingface_llm_image_uri(
“huggingface”,
version=“0.8.2”
)
it worked for me.
Thank you! Worked for me!
Could you tell me which instance I would need if I wanted to deploy llama2-13b finetuned? With the same configuration that I have managed to deploy llama2-7b, I cannot do it with 13b. I have tried with the ml.g5.8xlarge instance and neither.
try with ml.g5.12xlarge according to AWS guide.