Deploying custom inference script with llama2 finetuned model

Hello

I have used the following code to deploy fine tuned llama2 model with a custom inference script. the problem I have is that the custom inference.py script is not being used. I have made changes to the predict_fn and model_fn but it returns the usual response.
Here’s my code:

config = {
  'HF_MODEL_ID': "/opt/ml/model", 
  'SM_NUM_GPUS': json.dumps(1), # Number of GPU used per replica
  'MAX_INPUT_LENGTH': json.dumps(2048),  # Max length of input text
  'MAX_TOTAL_TOKENS': json.dumps(4096),  # Max length of the generation (including input text)

}
hf_model = HuggingFaceModel(
        model_data=s3_model_uri,
        role=role,
    image_uri=llm_image,
    env=config

    )

The s3_model_uri holds the model artifacts along with the inference.py code.The files are as follows:

model.tar.gz
|-- model artifacts
|-- code
|   |-- inference.py
|   |-- generate.py

The llm_image i get from get_huggingface_llm_image_uri is:
llm image uri: 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi1.0.3-gpu-py39-cu118-ubuntu20.04

Any help on this would be appreciated.
Thanks

1 Like

You cannot use the llm container with a custom inference script.

I have modified and provided the following as suggested with out any modifications to the above config :

hf_model = HuggingFaceModel(
    model_data=s3_model_uri,
    role=role,

    env=config,
    transformers_version="4.28.1",
    pytorch_version="2.0.0",
    py_version="py310",
    )

I get the following error in the logs:


huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/opt/ml/model'. Use `repo_type` argument if needed.

I understand that I should change HF_MODEL_ID but if give the repo_name from the huggingface hub, will it consider the artifacts in the tar.gz file ?

1 Like

Even after providing a hf_hub id, it does take the model from the tar.gz file but it searches for a .bin file. I am currently having the model in safetensors format so is there a hack to get pass this?

2023-11-24T11:00:07,577 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Failed to load the model: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory /opt/ml/model.

1 Like

what i did is just convert the safetensors file into a .bin file

import safetensors
import torch
model_path=“path/to/your/model/.safetensors”
pt_state_dict = safetensors.torch.load_file(model_path, device=“cpu”)
torch.save(pt_state_dict, “pytorch_model.bin”)

Is it the only fix to this problem? ?

So far that’s what I’ve been only able to find.