Deploying custom inference script with llama2 finetuned model


I have used the following code to deploy fine tuned llama2 model with a custom inference script. the problem I have is that the custom script is not being used. I have made changes to the predict_fn and model_fn but it returns the usual response.
Here’s my code:

config = {
  'HF_MODEL_ID': "/opt/ml/model", 
  'SM_NUM_GPUS': json.dumps(1), # Number of GPU used per replica
  'MAX_INPUT_LENGTH': json.dumps(2048),  # Max length of input text
  'MAX_TOTAL_TOKENS': json.dumps(4096),  # Max length of the generation (including input text)

hf_model = HuggingFaceModel(


The s3_model_uri holds the model artifacts along with the code.The files are as follows:

|-- model artifacts
|-- code
|   |--
|   |--

The llm_image i get from get_huggingface_llm_image_uri is:
llm image uri:

Any help on this would be appreciated.

1 Like

You cannot use the llm container with a custom inference script.

I have modified and provided the following as suggested with out any modifications to the above config :

hf_model = HuggingFaceModel(


I get the following error in the logs:

huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/opt/ml/model'. Use `repo_type` argument if needed.

I understand that I should change HF_MODEL_ID but if give the repo_name from the huggingface hub, will it consider the artifacts in the tar.gz file ?

1 Like

Even after providing a hf_hub id, it does take the model from the tar.gz file but it searches for a .bin file. I am currently having the model in safetensors format so is there a hack to get pass this?

2023-11-24T11:00:07,577 [INFO ] W-9000-model-stdout - Failed to load the model: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory /opt/ml/model.

what i did is just convert the safetensors file into a .bin file

import safetensors
import torch
pt_state_dict = safetensors.torch.load_file(model_path, device=“cpu”), “pytorch_model.bin”)

Is it the only fix to this problem? ?

So far that’s what I’ve been only able to find.