Errors: Batch transform on fine-tuned models

A. Batch transform on 1M rows

2022-03-25 06:34:31,078 [WARN ] W-model-1-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Token indices sequence length is longer than the specified maximum sequence length for this model (528 > 512). Running this sequence through the model will result in indexing errors
2022-03-25 06:34:31,092 [WARN ] W-model-1-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - /codebuild/output/src257227288/src/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [249,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
2022-03-25 06:34:31,106 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Prediction error
2022-03-25 06:34:31,107 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):
2022-03-25 06:34:31,107 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 222, in handle
2022-03-25 06:34:31,107 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     response = self.transform_fn(self.model, input_data, content_type, accept)
2022-03-25 06:34:31,107 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 181, in transform_fn

B. Batch transform on smaller dataset (10K rows)

2022-03-25T07:17:24.961:[sagemaker logs]: sagemaker-us-east-2-460XXXXXXX64/batch_transform/input_head/oot_head_data.jsonl: ClientError: 400
2022-03-25T07:17:24.961:[sagemaker logs]: sagemaker-us-east-2-460XXXXXXX64/batch_transform/input_head/oot_head_data.jsonl: 
2022-03-25T07:17:24.961:[sagemaker logs]: sagemaker-us-east-2-460XXXXXXX64/batch_transform/input_head/oot_head_data.jsonl: Message:
2022-03-25T07:17:24.961:[sagemaker logs]: sagemaker-us-east-2-460XXXXXXX64/batch_transform/input_head/oot_head_data.jsonl: {
2022-03-25T07:17:24.961:[sagemaker logs]: sagemaker-us-east-2-460XXXXXXX64/batch_transform/input_head/oot_head_data.jsonl:   "code": 400,
2022-03-25T07:17:24.961:[sagemaker logs]: sagemaker-us-east-2-460XXXXXXX64/batch_transform/input_head/oot_head_data.jsonl:   "type": "InternalServerException",
2022-03-25T07:17:24.961:[sagemaker logs]: sagemaker-us-east-2-460XXXXXXX64/batch_transform/input_head/oot_head_data.jsonl:   "message": "CUDA error: device-side assert triggered"

2022-03-25 07:17:24,943 [WARN ] W-model-1-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - /codebuild/output/src257227288/src/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [251,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

The code that I am using to create batch transform is as follows

sagemaker_session_bucket = sess.default_bucket()
df_processed_oot['inputs'] = df_processed_oot['detailedissue']
df_processed_oot[['inputs']].head(10000).to_csv(config['oot_head_csv'], index=None)
# datset files
dataset_csv_file = config['oot_head_csv']
dataset_jsonl_file = "oot_head_data.jsonl"

with open(dataset_csv_file, "r+") as infile, open(dataset_jsonl_file, "w+") as outfile:
    reader = csv.DictReader(infile);
    for row in reader:
        # remove @
        #row["inputs"] = row["inputs"].replace("@","")
        json.dump(row, outfile);
        outfile.write('\n');

input_s3_path = s3_path_join("s3://", sagemaker_session_bucket, "batch_transform/input_head")
output_s3_path = s3_path_join("s3://", sagemaker_session_bucket, "batch_transform/output_head")
s3_file_uri_head = S3Uploader.upload(dataset_jsonl_file, input_s3_path)

print(f"{dataset_jsonl_file} uploaded to {s3_file_uri}")

# create Hugging Face Model Class for classifier
huggingface_model = HuggingFaceModel(
   model_data =model_uri, # configuration for loading model from Hub
   role=role, # iam role with permissions to create an Endpoint
   transformers_version="4.6", # transformers version used
   pytorch_version="1.7", # pytorch version used
   py_version='py36', # python version used
)

# create Transformer to run our batch job
batch_job = huggingface_model.transformer(
    instance_count=1,
    instance_type='ml.g4dn.xlarge',
    output_path=output_s3_path, # we are using the same s3 path to save the output with the input
    strategy='SingleRecord'
)

batch_job.transform(data=s3_file_uri, content_type='application/json', split_type='Line')

I am new to HF and transformer libraries; it would be great if someone could help me out here with the easiest way to use the truncate option during batch prediction on a large dataset

This error says that you are providing inputs that are longer than the model can handle. You can control this either through additional parameters in the jsonl file, but for this to be possible the model/tokenizer must be able to know the max_length of the model.

Thanks for the prompt reply. Is there a sample notebook that helps add parameters when we convert csv into a jsonl file for each record?

You can take a look here: Reference

I ran into a similar issue, the data in your jsonl file for inference should look something like this…

{"inputs":"...Some long text string, likely over 512 tokens after tokenization....","parameters":{"truncation":true,"max_length":512}}
{"inputs":"...Another long text string, likely over 512 tokens after tokenization....","parameters":{"truncation":true,"max_length":512}}