Background:
I am working with a model I fined tuned for a multi-classification problem (distilbert-base-uncased was my base model). I trying to use my model in a sagemaker batch transform (inference) job.
My data for inference was updated to S3 as a jsonl file and looks like this…
{"inputs":"...Some long text string, likely over 512 tokens after tokenization....","parameters":{"return_all_scores":true,"truncation":true,"max_length":512}}
{"inputs":"...Another long text string, likely over 512 tokens after tokenization....","parameters":{"return_all_scores":true,"truncation":true,"max_length":512}}
Note, the parameters were chosen because I need to truncate tokenized input strings that are longer than 512 tokens (per the requirements of my model) and because I want to return the prediction probability for all classes.
My Code for initiating the batch transform job looks like this…
huggingface_model = HuggingFaceModel(
model_data=f"s3://{s3_training_job_model}/model.tar.gz", # path trained model
role=role
transformers_version='4.17',
tensorflow_version='2.6',
py_version="py38",
env={'HF_TASK': 'text-classification' } )
batch_job = huggingface_model.transformer(
instance_count=1,
instance_type="ml.m5.xlarge",
strategy='SingleRecord',
output_path = f's3://{s3_training_job_data}',
accept='application/json')
batch_job.transform(
data=s3_test_data_uri, # s3 path to my data for inference
content_type='application/json',
split_type='Line')
My Problem:
According to Sagemaker, my batch tranform job failed (the Status in sagemaker shows “Failed”).
I also get an error in my sagemaker notebook which halts my code.
ODDLY, Running this code generates a .out file with the predictions I would expect in the S3 location specified in output_path. It seems like my code is working, but I just can escape these pesky error messages.
My Error looks like this…
2023-05-04T13:47:48.177:[sagemaker logs]: MaxConcurrentTransforms=1, MaxPayloadInMB=6, BatchStrategy=SINGLE_RECORD
2023-05-04T13:47:53.326:[sagemaker logs]: sagemaker-us-east-1-/jpIndClassTL-sample20-20230419214517/data/jp_test_data_for_preds.jsonl.out: ClientError: 400
2023-05-04T13:47:53.327:[sagemaker logs]: sagemaker-us-east-1-/jpIndClassTL-sample20-20230419214517/data/jp_test_data_for_preds.jsonl.out:
2023-05-04T13:47:53.327:[sagemaker logs]: sagemaker-us-east-1-/jpIndClassTL-sample20-20230419214517/data/jp_test_data_for_preds.jsonl.out: Message:
2023-05-04T13:47:53.327:[sagemaker logs]: sagemaker-us-east-1-/jpIndClassTL-sample20-20230419214517/data/jp_test_data_for_preds.jsonl.out: {
2023-05-04T13:47:53.327:[sagemaker logs]: sagemaker-us-east-1-/jpIndClassTL-sample20-20230419214517/data/jp_test_data_for_preds.jsonl.out: "code": 400,
2023-05-04T13:47:53.327:[sagemaker logs]: sagemaker-us-east-1-/jpIndClassTL-sample20-20230419214517/data/jp_test_data_for_preds.jsonl.out: "type": "InternalServerException",
2023-05-04T13:47:53.327:[sagemaker logs]: sagemaker-us-east-1-/jpIndClassTL-sample20-20230419214517/data/jp_test_data_for_preds.jsonl.out: "message": "Extra data: line 1 column 46 (char 45)"
2023-05-04T13:47:53.327:[sagemaker logs]: sagemaker-us-east-1-/jpIndClassTL-sample20-20230419214517/data/jp_test_data_for_preds.jsonl.out: }
2023-05-04T13:47:53.342:[sagemaker logs]: sagemaker-us-east-1-/jpIndClassTL-sample20-20230419214517/data/jp_test_data_for_preds.jsonl.out.out: ClientError: 400
2023-05-04T13:47:53.343:[sagemaker logs]: sagemaker-us-east-1-/jpIndClassTL-sample20-20230419214517/data/jp_test_data_for_preds.jsonl.out.out:
2023-05-04T13:47:53.343:[sagemaker logs]: sagemaker-us-east-1-/jpIndClassTL-sample20-20230419214517/data/jp_test_data_for_preds.jsonl.out.out: Message:
2023-05-04T13:47:53.343:[sagemaker logs]: sagemaker-us-east-1-/jpIndClassTL-sample20-20230419214517/data/jp_test_data_for_preds.jsonl.out.out: {
2023-05-04T13:47:53.343:[sagemaker logs]: sagemaker-us-east-1-/jpIndClassTL-sample20-20230419214517/data/jp_test_data_for_preds.jsonl.out.out: "code": 400,
2023-05-04T13:47:53.343:[sagemaker logs]: sagemaker-us-east-1-/jpIndClassTL-sample20-20230419214517/data/jp_test_data_for_preds.jsonl.out.out: "type": "InternalServerException",
2023-05-04T13:47:53.343:[sagemaker logs]: sagemaker-us-east-1-/jpIndClassTL-sample20-20230419214517/data/jp_test_data_for_preds.jsonl.out.out: "message": "pop expected at most 1 argument, got 2"
2023-05-04T13:47:53.343:[sagemaker logs]: sagemaker-us-east-1-/jpIndClassTL-sample20-20230419214517/data/jp_test_data_for_preds.jsonl.out.out: }
Any thoughts @miOmiO @philschmid ? Thank you in advance to anyone that responds.