HuggingFaceModel ignores code directory

Hi,

I am facing some issues trying to initialize an endpoint with custom inference code for LLaMa 3.2 3B. Specifically, the endpoint launches but it ignores the requirements.txt and inference.py code completely.

Here is the sagemaker code used to create and deploy the model:

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer

try:
	role = sagemaker.get_execution_role()
except ValueError:
	iam = boto3.client('iam')
	role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

model_data = "s3://custom-llamas/model.tar.gz"
hf_token = "..."
hub = {
    "SM_MODEL_DIR": model_data,
    "HF_MODEL_ID": "/opt/ml/model",
    "SM_NUM_GPUS": json.dumps(1),
    "HF_TOKEN": hf_token,
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    name="hf-llama-model-custom",
    image_uri=get_huggingface_llm_image_uri("huggingface",version="3.2.3"),
    env=hub,
    role=role,
    model_data=model_data,
    entry_point="inference.py",
    source_dir="code",
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    endpoint_name="hf-llama-endpoint",
	initial_instance_count=1,
	instance_type="ml.g5.2xlarge",
	container_startup_health_check_timeout=300,
    endpoint_logging=True,
    serializer=JSONSerializer(),
    deserializer=JSONDeserializer()
  )

The contents of the model.tar.gz are the following.

 $ tar -tzf model.tar.gz 
./
./model.safetensors.index.json
./generation_config.json
./tokenizer_config.json
./config.json
./.gitattributes
./USE_POLICY.md
./special_tokens_map.json
./model-00002-of-00002.safetensors
./model-00001-of-00002.safetensors
./original/
./original/orig_params.json
./original/params.json
./original/tokenizer.model
./original/consolidated.00.pth
./code/
./code/requirements.txt
./code/inference.py
./tokenizer.json
./LICENSE.txt
./README.md

I have tried with the /code/ directory in scope, to re-create the model.tar.gz when deploy is called, and without, since the code remains the same as the one already in the model_data. The logs indicate that the endpoint is reading the local model (loading from /opt/ml/model) but it completely ignores the code directory.

It should be installing packages from requirements.txt which are not shown in cloudwatch logs. This probably does not happen; I have added some un-wanted packages to check if it reads the code directory and they show in logs, but they are not required by the endpoint and I am going to remove them later so I cannot really tell if they are installed.

However, I have a custom script that expects a different JSON input than the original, which does not include the inputs key, but has some other logic that is used to compile the input prompt. While the endpoint spawns, it behaves with default logic (i.e., expects the inputs key) and does not work with the custom code. I also added some prints/logger messages on the custom functions but nothing worked.

Function signatures on inference.py

def model_fn(model_dir):

def input_fn(input_data, content_type):

def predict_fn(data, pipe):

def output_fn(prediction, accept):

Any suggestions? Did I miss some configuration anywhere?

1 Like

Might be this case?

From the link, it says that we need to download the model artifacts and push them to S3 before using the custom inference script.

I can confidently say that I did this correctly in both cases.
On the first case, I downloaded the model artifacts and repackaged everything (as the first link shows) with the code directory included.
On the second case, I use the entry_point and source_dir, to re-package the model artifacts with the code directory on the fly before calling the endpoint.
In both cases (although I cannot actually confirm it as far as I know), the code directory should be within the deployed image. It is just ignored.

The only assumption I can maybe make at this point is that the TGI image ignores any custom code and only serves the model in the default way. But this is not somehow hinted here as far as I understand.

1 Like