Hi, I’m trying to use Sagemaker’s Batch Transform utility in order to perform LLM inference using a LLAMA-3 8B-Instruct Model.
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri
import json
generate_parameters = {
'temperature': '0.6',
'top_p': '0.9',
'do_sample': 'True',
'max_new_tokens': '256',
'return_full_text': 'False' # This ensures the input text is not included in the output
}
# hub.update({'HF_PARAMETERS': json.dumps(generate_parameters)})
huggingface_model = HuggingFaceModel(
env=hub, # configuration for loading model from Hub
role=role,
image_uri=get_huggingface_llm_image_uri("huggingface", version="2.0.2"),
)
batch_job = huggingface_model.transformer(
instance_count=1,
instance_type='ml.g5.2xlarge',
output_path=s3_output_data_path,
strategy='SingleRecord',
env = generate_parameters # Max payload size in MB
)
batch_job.transform(
data=s3_input_data_path,
content_type='application/json',
split_type='Line'
)
No matter what I do, I just can’t get this to properly set the LLM parameters. The temperature, top_p, etc. are all just defaulted to None.
I’d appreciate it if someone could maybe take a look at this and see if I’m passing the params in wrong?