Error in batch transform job with Huggingface model and SageMaker

resuemos321 · November 14, 2023, 3:53pm

In case anyone is still struggling with this, here is what I think is happening (which may or may not be true, but it’s working for me now):
When using the input filter, you are using a JSON path filter. If you have input of the form [{“inputs”: “some string”, “id”: 12345},…], then using $.inputs as an input filter would result in an output (and input to the model) of just the strings stored under the “inputs” key, so [“some string”, “another string”,…]. This causes issues with the following line of code in the sagemaker-huggingface-inference-toolkit/handler_service.py script: inputs = data.pop(“inputs”, data) (line 167; not sure how to link to it directly). The pop method won’t work on just a list of strings, so it throws an error at this point. I realized that I don’t even need to apply an input filter, since the text input will be ‘popped’ anyways (as long as it is stored under “inputs” in the input data). So I just leave out the input filter, and instead just associate the entire output with the original (unpopped input) - like so:

batch_job.transform(
data=input_s3_uri,
data_type=“S3Prefix”,
content_type=“application/json”,
split_type=“Line”,
join_source=“Input”,
output_filter=“$[‘id’,‘inputs’,‘SageMakerOutput’]”
)

The labels and score are contained in ‘SageMakerOutput’. I then use a custom post-processing function that reads in the jsonl data and transforms it into a pandas dataframe, which I then write to S3.
Like I said this is just what worked for me, but I thought I’d share it because this issue was giving me headaches for quite a while and there doesn’t seem to be clear guidance/documentation on it.

Topic		Replies	Views
Errors while running a sagemaker batch transform (inference) job Beginners	2	1084	May 15, 2023
Batch_transform Pipeline? Amazon SageMaker	9	3444	September 28, 2021
[SOLVED] Error of input when requesting batch-transform job of zero-shot-text-classification on SageMaker Amazon SageMaker	1	262	March 20, 2024
ClientError:400 when using batch transformer on sagemaker for inference Amazon SageMaker	3	2043	January 11, 2022
Errors: Batch transform on fine-tuned models Amazon SageMaker	4	1576	May 4, 2023

Error in batch transform job with Huggingface model and SageMaker

Related topics