ClientErro:400 when using batch transformer for inference

Hi everyone,
I try to do sentiment analysis on a bunch of data and follow the example notebook notebooks/sagemaker-notebook.ipynb at main · huggingface/notebooks · GitHub and below is my code:

from sagemaker.huggingface.model import HuggingFaceModel


hub = {
 'HF_MODEL_ID':'cardiffnlp/twitter-roberta-base-sentiment',
 'HF_TASK':'text-classification'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
env=hub, 
role=role, 
transformers_version='4.6',
pytorch_version="1.7", 
py_version='py36', 
)

# create Transformer to run our batch job
batch_job = huggingface_model.transformer(
 instance_count=1,
 instance_type='ml.p3.2xlarge',
 output_path=output_s3_path,
 strategy='SingleRecord')

# starts batch transform job and uses s3 data as input
batch_job.transform(
 data=input_s3_path,
 content_type="application/json",    
 split_type="Line")

My input jsonl file is formatted as

The model successfully infer when I feed with first 10 rows (totally this data is about 6k rows)
but it throw errors when I expand to 100 rows.

I’m super new to sagemaker and hugging face , can anyone tell me what I’m missing ? Thank you

sorry it’s still me, just attach more pics so the context might be better understood.

I heard BERT has limitation on text length so I truncated each line to 460 words

Hi miOmiO - did the truncation fix your problem? If not, would you mind sharing what error message you are getting?

Hi @marshmellow77 , yes, truncation make some long text acceptable to the model, but not all of them,
even I specify the length to 460 words. I tried from 500 to 460. I have no idea if I should keep reducing this or other modification would help.

@miOmiO you have created a second thread: ClientError:400 when using batch transformer on sagemaker for inference
where i responded. Can this be closed? or those two different threads.

Hi @philschmid , sure, please kindly close this post, I will update on that one. Thank you.

Thanks for letting me know. Then here again the response

Hey @miOmiO,

Happy to help you here! to narrow down your issue. I think the first step would be to check if the dataset was created in the sample (notebooks/sagemaker-notebook.ipynb at master · huggingface/notebooks · GitHub ) works or if it also errors out.

Additionally, could you bump the version of the HuggingFaceModel to the latest one? For transformers_version that’s 4.12.3 and for pytorch_version its 1.9.1 maybe this already solves your issue. You can find the list of available containers here: Reference

Also worth testing is to replace your model with a different model, e.g. distilbert-base-uncased-finetuned-sst-2-english

Hi @philschmid ,
Thank you for the reply! I check all three approaches :
1 The inputs format comply with the json format from sample notebook.
2 I use the latest version of of framework as suggested
3 Also switch to different model to see if it’s specific model issue.

It still prompts client error:400 , but I notice that as long as I truncated my each line of text to 60 words(randomly pick the number) , both two model work fine. Is this related to the text length limitation? If so, is there any way I can specify the length of text when building the batch job?

P.S. my data set is around 6k rows , some rows have more than 512 words/tokens.

Hi @miOmiO - text length limitation could be an issue here. Note that the length refers to the number of tokens, not the number of words. Because BERT models generally use subword tokenization it can happen that one word is split into 2 or more tokens. That is why even reducing the number of words to 460 sometimes might throw an error.

To test this you could try to use the model row by row and see if the number of examples that fail correspond to the same ones in your batch job. If it is indeed the number of tokens that cause the model to fail you should be seeing an error message like "... sequence length is longer than the specified maximum sequence length for this model ..."

If this indeed the source of error then it might be easiest to truncate the input sequence of tokens after the tokenization (rather than the number of words before tokenization).

Hope that helps.

The model cardiffnlp/twitter-roberta-base-sentiment don’t has a max len defined. I tried to reach out the authors but they haven’t responded. See Add `tokenizer_max_length` to `cardiffnlp/twitter-roberta-base-sentiment` · Issue #13459 · huggingface/transformers · GitHub

You could “fork” the model → creating a new model repository and push the weights + add the tokenizer_config then truncation:True should work properly.

Hi @marshmellow77 , thank you for leading me into thought of sub-word tokenization .

Hi @philschmid , I got solution from another related post under your help as well :
How are the inputs tokenized when model deployment? - Amazon SageMaker - Hugging Face Forums

After I switch to another model and remodify the input json file format like this:
{“inputs”: “long sentence 2”, “parameters”: {“trucation”: true}}, the new model works well for me ( as long as it has ‘max_length’ attribute in tokenizer config file.)

1 Like