Error loading finetuned llama2 model while running inference

Harisvossos10 · September 19, 2023, 1:12pm

I get the same error here for CodeLlama-7b-hf-Instruct.

I even included a requirements.txt file in model.tar.gz requiring a transformers==4.33.2 but it doesnt work.
Any ideas?

jeremydd · September 19, 2023, 1:36pm

Do you mean transformers==4.33.2?

Mit1208 · September 19, 2023, 1:48pm

@jeremydd yup I fined tuned llama-7b on my own data and deploy on Sagemaker.

cnicu · September 20, 2023, 8:14am

Hello @Mit1208 ! I have tried to deploy the finetuned Llama-7b with my own data, with TGI v1.0 as you mentioned and it still gives me the error

“FileNotFoundError: No local weights found in /opt/ml/model with extension .bin”

Could you give me more details on how you managed to deploy it in Sagemaker? Thanks!

Mit1208 · September 20, 2023, 12:20pm

Hi @cnicu,
I was getting same error when I didn’t specified my model path. I used spot instance while training so my checkpoints were on the S3.

config = {
'HF_MODEL_ID': '/opt/ml/model', # path to where sagemaker stores the model
'SM_NUM_GPUS': json.dumps(number_of_gpu), # Number of GPU used per replica
'MAX_INPUT_LENGTH': json.dumps(1024),  # Max length of input text
'MAX_TOTAL_TOKENS': json.dumps(2048),  # Max length of the generation (including input text),
}

# create HuggingFaceModel
llm_model = HuggingFaceModel(
role=role,
image_uri=llm_image,
model_data=huggingface_estimator.model_data,
env=config
)

# Deploy model to an endpoint
llm = llm_model.deploy(
endpoint_name="llama-2-7b-finetuned",
initial_instance_count=1,
instance_type=instance_type,
container_startup_health_check_timeout=health_check_timeout, 
)

Here model_data is URI of s3 checkpoint. Make sure you have a model at S3 URI path.

Harisvossos10 · September 20, 2023, 1:50pm

maybe try using this:

llm_image = get_huggingface_llm_image_uri(
“huggingface”,
version=“0.8.2”
)

it worked for me.

cnicu · September 20, 2023, 4:17pm

Thank you! Worked for me!

Could you tell me which instance I would need if I wanted to deploy llama2-13b finetuned? With the same configuration that I have managed to deploy llama2-7b, I cannot do it with 13b. I have tried with the ml.g5.8xlarge instance and neither.

Mit1208 · September 20, 2023, 5:32pm

try with ml.g5.12xlarge according to AWS guide.

Topic		Replies	Views
ValueError: Could not load model /opt/ml/model with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'>) Amazon SageMaker	0	389	March 13, 2024
Error code 400 when running llama2 on sagemaker endpoint Amazon SageMaker	1	1222	July 24, 2023
Getting error in the inference stage of Transformers Model (Hugging Face) 🤗Transformers	0	782	October 11, 2022
InternalServerException when running a model loaded on S3 Amazon SageMaker	4	984	August 6, 2021
Use my finetuned Bert Model in SageMaker BatchTransform Amazon SageMaker	4	2968	April 30, 2022

Error loading finetuned llama2 model while running inference

Related topics