Error in batch transform job with Huggingface model and SageMaker

In case anyone is still struggling with this, here is what I think is happening (which may or may not be true, but it’s working for me now):
When using the input filter, you are using a JSON path filter. If you have input of the form [{“inputs”: “some string”, “id”: 12345},…], then using $.inputs as an input filter would result in an output (and input to the model) of just the strings stored under the “inputs” key, so [“some string”, “another string”,…]. This causes issues with the following line of code in the sagemaker-huggingface-inference-toolkit/handler_service.py script: inputs = data.pop(“inputs”, data) (line 167; not sure how to link to it directly). The pop method won’t work on just a list of strings, so it throws an error at this point. I realized that I don’t even need to apply an input filter, since the text input will be ‘popped’ anyways (as long as it is stored under “inputs” in the input data). So I just leave out the input filter, and instead just associate the entire output with the original (unpopped input) - like so:

batch_job.transform(
data=input_s3_uri,
data_type=“S3Prefix”,
content_type=“application/json”,
split_type=“Line”,
join_source=“Input”,
output_filter=“$[‘id’,‘inputs’,‘SageMakerOutput’]”
)

The labels and score are contained in ‘SageMakerOutput’. I then use a custom post-processing function that reads in the jsonl data and transforms it into a pandas dataframe, which I then write to S3.
Like I said this is just what worked for me, but I thought I’d share it because this issue was giving me headaches for quite a while and there doesn’t seem to be clear guidance/documentation on it.