Deploying custom inference script with llama2 finetuned model

saitejinfrrd · November 23, 2023, 12:00pm

Hello

I have used the following code to deploy fine tuned llama2 model with a custom inference script. the problem I have is that the custom inference.py script is not being used. I have made changes to the predict_fn and model_fn but it returns the usual response.
Here’s my code:

config = {
  'HF_MODEL_ID': "/opt/ml/model", 
  'SM_NUM_GPUS': json.dumps(1), # Number of GPU used per replica
  'MAX_INPUT_LENGTH': json.dumps(2048),  # Max length of input text
  'MAX_TOTAL_TOKENS': json.dumps(4096),  # Max length of the generation (including input text)

}

hf_model = HuggingFaceModel(
        model_data=s3_model_uri,
        role=role,
    image_uri=llm_image,
    env=config

    )

The s3_model_uri holds the model artifacts along with the inference.py code.The files are as follows:

model.tar.gz
|-- model artifacts
|-- code
|   |-- inference.py
|   |-- generate.py

The llm_image i get from get_huggingface_llm_image_uri is:
llm image uri: 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi1.0.3-gpu-py39-cu118-ubuntu20.04

Any help on this would be appreciated.
Thanks

philschmid · November 23, 2023, 1:22pm

You cannot use the llm container with a custom inference script.

saitejinfrrd · November 23, 2023, 1:40pm

I have modified and provided the following as suggested with out any modifications to the above config :

hf_model = HuggingFaceModel(
    model_data=s3_model_uri,
    role=role,

    env=config,
    transformers_version="4.28.1",
    pytorch_version="2.0.0",
    py_version="py310",
    )

I get the following error in the logs:


huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/opt/ml/model'. Use `repo_type` argument if needed.

I understand that I should change HF_MODEL_ID but if give the repo_name from the huggingface hub, will it consider the artifacts in the tar.gz file ?

saitejinfrrd · November 24, 2023, 11:08am

Even after providing a hf_hub id, it does take the model from the tar.gz file but it searches for a .bin file. I am currently having the model in safetensors format so is there a hack to get pass this?

2023-11-24T11:00:07,577 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Failed to load the model: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory /opt/ml/model.

Dnsibu · December 14, 2023, 9:51pm

what i did is just convert the safetensors file into a .bin file

import safetensors
import torch
model_path=“path/to/your/model/.safetensors”
pt_state_dict = safetensors.torch.load_file(model_path, device=“cpu”)
torch.save(pt_state_dict, “pytorch_model.bin”)

sourceful-tianlin · January 2, 2024, 5:49pm

Is it the only fix to this problem? ?

Dnsibu · January 4, 2024, 5:44pm

So far that’s what I’ve been only able to find.

Topic		Replies	Views
Sagemaker Pipelines with fintuned llama2 Amazon SageMaker	0	852	September 12, 2023
Sagemaker deployment fails for local llama2 model Amazon SageMaker	3	2267	August 17, 2023
Error loading finetuned llama2 model while running inference Amazon SageMaker	27	4800	September 20, 2023
HuggingFaceModel ignores code directory Amazon SageMaker	2	12	June 17, 2025
Deploying TinyLlama Model via SageMaker Inference Endpoint with Custom Setup Amazon SageMaker	0	448	April 7, 2024

Deploying custom inference script with llama2 finetuned model

Related topics