Hi,
I am facing some issues trying to initialize an endpoint with custom inference code for LLaMa 3.2 3B. Specifically, the endpoint launches but it ignores the requirements.txt
and inference.py
code completely.
Here is the sagemaker code used to create and deploy the model:
import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer
try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']
model_data = "s3://custom-llamas/model.tar.gz"
hf_token = "..."
hub = {
"SM_MODEL_DIR": model_data,
"HF_MODEL_ID": "/opt/ml/model",
"SM_NUM_GPUS": json.dumps(1),
"HF_TOKEN": hf_token,
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
name="hf-llama-model-custom",
image_uri=get_huggingface_llm_image_uri("huggingface",version="3.2.3"),
env=hub,
role=role,
model_data=model_data,
entry_point="inference.py",
source_dir="code",
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
endpoint_name="hf-llama-endpoint",
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
container_startup_health_check_timeout=300,
endpoint_logging=True,
serializer=JSONSerializer(),
deserializer=JSONDeserializer()
)
The contents of the model.tar.gz
are the following.
$ tar -tzf model.tar.gz
./
./model.safetensors.index.json
./generation_config.json
./tokenizer_config.json
./config.json
./.gitattributes
./USE_POLICY.md
./special_tokens_map.json
./model-00002-of-00002.safetensors
./model-00001-of-00002.safetensors
./original/
./original/orig_params.json
./original/params.json
./original/tokenizer.model
./original/consolidated.00.pth
./code/
./code/requirements.txt
./code/inference.py
./tokenizer.json
./LICENSE.txt
./README.md
I have tried with the /code/
directory in scope, to re-create the model.tar.gz
when deploy is called, and without, since the code remains the same as the one already in the model_data
. The logs indicate that the endpoint is reading the local model (loading from /opt/ml/model) but it completely ignores the code directory.
It should be installing packages from requirements.txt
which are not shown in cloudwatch logs. This probably does not happen; I have added some un-wanted packages to check if it reads the code directory and they show in logs, but they are not required by the endpoint and I am going to remove them later so I cannot really tell if they are installed.
However, I have a custom script that expects a different JSON input than the original, which does not include the inputs
key, but has some other logic that is used to compile the input prompt. While the endpoint spawns, it behaves with default logic (i.e., expects the inputs
key) and does not work with the custom code. I also added some prints/logger messages on the custom functions but nothing worked.
Function signatures on inference.py
def model_fn(model_dir):
def input_fn(input_data, content_type):
def predict_fn(data, pipe):
def output_fn(prediction, accept):
Any suggestions? Did I miss some configuration anywhere?