Hello everyone,
I’m new to hugging face and try to do sentiment analysis on bunch of texts with batch transformer job , here is my code and I follow through example notebook : notebooks/sagemaker-notebook.ipynb at main · huggingface/notebooks · GitHub
Here I rewrite the input json file as {“inputs”: “xxxxxxxxxx”}, and truncate long text into 460 words since BERT model has limitation on text length,
following is my code :
hub = {
'HF_MODEL_ID':'cardiffnlp/twitter-roberta-base-sentiment',
'HF_TASK':'text-classification'
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
env=hub,
role=role,
transformers_version='4.6',
pytorch_version="1.7",
py_version='py36',
)
# create Transformer to run our batch job
batch_job = huggingface_model.transformer(
instance_count=1,
instance_type='ml.p3.2xlarge',
output_path=output_s3_path,
strategy='SingleRecord')
# starts batch transform job and uses s3 data as input
batch_job.transform(
data=input_s3_path,
content_type="application/json",
split_type="Line")
I test it with first 10 rows (data is 6k rows totally). It did successfully output result, however when I expand input to 100 rows , it prompts client error as this:
No older events at this moment. Retry
2022-01-07T13:33:06.824-08:00 2022-01-07T21:33:03.067:[sagemaker logs]: MaxConcurrentTransforms=1, MaxPayloadInMB=6, BatchStrategy=SINGLE_RECORD
2022-01-07T13:33:10.825-08:00 2022-01-07T21:33:10.019:[sagemaker logs]: soa-pax-processed/sentiment_hf/month_test.jsonl: ClientError: 400
2022-01-07T13:33:10.825-08:00 2022-01-07T21:33:10.019:[sagemaker logs]: soa-pax-processed/sentiment_hf/month_test.jsonl:
2022-01-07T13:33:10.825-08:00 2022-01-07T21:33:10.019:[sagemaker logs]: soa-pax-processed/sentiment_hf/month_test.jsonl: Message:
2022-01-07T13:33:10.825-08:00 2022-01-07T21:33:10.020:[sagemaker logs]: soa-pax-processed/sentiment_hf/month_test.jsonl: {
2022-01-07T13:33:10.825-08:00 2022-01-07T21:33:10.020:[sagemaker logs]: soa-pax-processed/sentiment_hf/month_test.jsonl: “code”: 400,
2022-01-07T13:33:10.825-08:00 2022-01-07T21:33:10.020:[sagemaker logs]: soa-pax-processed/sentiment_hf/month_test.jsonl: “type”: “InternalServerException”,
2022-01-07T13:33:10.825-08:00 2022-01-07T21:33:10.020:[sagemaker logs]: soa-pax-processed/sentiment_hf/month_test.jsonl: “message”: “CUDA error: device-side assert triggered”
2022-01-07T13:33:10.825-08:00 2022-01-07T21:33:10.021:[sagemaker logs]: soa-pax-processed/sentiment_hf/month_test.jsonl: }
Can anyone help me on what I’m mssing? Any help is highly appreciated!