ClientError:400 when using batch transformer on sagemaker for inference

Hello everyone,
I’m new to hugging face and try to do sentiment analysis on bunch of texts with batch transformer job , here is my code and I follow through example notebook : notebooks/sagemaker-notebook.ipynb at master · huggingface/notebooks · GitHub
Here I rewrite the input json file as {“inputs”: “xxxxxxxxxx”}, and truncate long text into 460 words since BERT model has limitation on text length,

following is my code :


hub = {
    'HF_MODEL_ID':'cardiffnlp/twitter-roberta-base-sentiment',
    'HF_TASK':'text-classification'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   env=hub, 
   role=role, 
   transformers_version='4.6',
   pytorch_version="1.7", 
   py_version='py36', 
)
# create Transformer to run our batch job
batch_job = huggingface_model.transformer(
    instance_count=1,
    instance_type='ml.p3.2xlarge',
    output_path=output_s3_path,
    strategy='SingleRecord')

# starts batch transform job and uses s3 data as input
batch_job.transform(
    data=input_s3_path,
    content_type="application/json",    
    split_type="Line")

I test it with first 10 rows (data is 6k rows totally). It did successfully output result, however when I expand input to 100 rows , it prompts client error as this:

No older events at this moment. Retry
2022-01-07T13:33:06.824-08:00 2022-01-07T21:33:03.067:[sagemaker logs]: MaxConcurrentTransforms=1, MaxPayloadInMB=6, BatchStrategy=SINGLE_RECORD
2022-01-07T13:33:10.825-08:00 2022-01-07T21:33:10.019:[sagemaker logs]: soa-pax-processed/sentiment_hf/month_test.jsonl: ClientError: 400
2022-01-07T13:33:10.825-08:00 2022-01-07T21:33:10.019:[sagemaker logs]: soa-pax-processed/sentiment_hf/month_test.jsonl:
2022-01-07T13:33:10.825-08:00 2022-01-07T21:33:10.019:[sagemaker logs]: soa-pax-processed/sentiment_hf/month_test.jsonl: Message:
2022-01-07T13:33:10.825-08:00 2022-01-07T21:33:10.020:[sagemaker logs]: soa-pax-processed/sentiment_hf/month_test.jsonl: {
2022-01-07T13:33:10.825-08:00 2022-01-07T21:33:10.020:[sagemaker logs]: soa-pax-processed/sentiment_hf/month_test.jsonl: “code”: 400,
2022-01-07T13:33:10.825-08:00 2022-01-07T21:33:10.020:[sagemaker logs]: soa-pax-processed/sentiment_hf/month_test.jsonl: “type”: “InternalServerException”,
2022-01-07T13:33:10.825-08:00 2022-01-07T21:33:10.020:[sagemaker logs]: soa-pax-processed/sentiment_hf/month_test.jsonl: “message”: “CUDA error: device-side assert triggered”
2022-01-07T13:33:10.825-08:00 2022-01-07T21:33:10.021:[sagemaker logs]: soa-pax-processed/sentiment_hf/month_test.jsonl: }

Can anyone help me on what I’m mssing? Any help is highly appreciated!

Hey @miOmiO,

Happy to help you here! to narrow down your issue. I think the first step would be to check if the dataset was created in the sample (notebooks/sagemaker-notebook.ipynb at master · huggingface/notebooks · GitHub) works or if it also errors out.

Additionally, could you bump the version of the HuggingFaceModel to the latest one? For transformers_version that’s 4.12.3 and for pytorch_version its 1.9.1 maybe this already solves your issue. You can find the list of available containers here: Reference

Also worth testing is to replace your model with a different model, e.g. distilbert-base-uncased-finetuned-sst-2-english

Duplicate of : ClientErro:400 when using batch transformer for inference