I have used the following code to deploy fine tuned llama2 model with a custom inference script. the problem I have is that the custom inference.py script is not being used. I have made changes to the predict_fn and model_fn but it returns the usual response.
Here’s my code:
config = {
'HF_MODEL_ID': "/opt/ml/model",
'SM_NUM_GPUS': json.dumps(1), # Number of GPU used per replica
'MAX_INPUT_LENGTH': json.dumps(2048), # Max length of input text
'MAX_TOTAL_TOKENS': json.dumps(4096), # Max length of the generation (including input text)
}
The llm_image i get from get_huggingface_llm_image_uri is: llm image uri: 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi1.0.3-gpu-py39-cu118-ubuntu20.04
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/opt/ml/model'. Use `repo_type` argument if needed.
I understand that I should change HF_MODEL_ID but if give the repo_name from the huggingface hub, will it consider the artifacts in the tar.gz file ?
Even after providing a hf_hub id, it does take the model from the tar.gz file but it searches for a .bin file. I am currently having the model in safetensors format so is there a hack to get pass this?
2023-11-24T11:00:07,577 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Failed to load the model: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory /opt/ml/model.