How are the inputs tokenized when model deployment?

Hi.

I’m working through the series of sagemaker-hugginface notebooks and it is not clear to me how the predict data is preprocess before call the model.

The notebook 01_getting_started_pytorch.ipynb shows these 3 steps:

  • preprocess datasets
  • save datsets on s3
  • train the model using sagemaker Huggingface API
  • once model trained, deploy model and make predictions from a input data in a dictionary format like: {"inputs": "blablabla"}

my question is: ¿how are these input data being tokenized before to get in the model?

Hey @Oigres,

Which tokenization step do you mean for training or inference?

For training, the tokenization is done in the preprocessing in the notebook.
For inference, the tokenization is done in the sagemaker-huggingface-inference-toolkit and the toolkit leverages the transformers pipeline.

Which tokenization step do you mean for training or inference?
Inference step.

I want to replicate the same tokenization performed in the preprocessing notebook but in inference time.

I am blocked with this problem at this moment. It seems that my input data for inference is not being tokenized as I indicated in the preprocessing notebook. how can I indicate to the inference step to make the proper tokenization?

Can you provide your code snippet, what you are trying to do or an error message? I am not fully understanding where you are blocked or what you want to do.

Okay.

Let see if I can explain myself better with code.

This is part of my training notebook (maybe are left some imports)

As you appreciate, I set the constraint max_length=256, so longer sentences are truncated up to this length.

# download tokenizer
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)

# tokenizer helper function
def tokenize(batch):
    return tokenizer(batch['text'], padding='max_length', truncation=True, max_length=256)

# tokenize dataset
train_dataset = train_dataset.map(tokenize, batched=True)
test_dataset = test_dataset.map(tokenize, batched=True)

# set format for pytorch
# train_dataset =  train_dataset.rename_column("label", "labels")
train_dataset.set_format('torch', columns=['input_ids', 'attention_mask', 'labels'])
# test_dataset = test_dataset.rename_column("label", "labels")
test_dataset.set_format('torch', columns=['input_ids', 'attention_mask', 'labels'])

import botocore
from datasets.filesystems import S3FileSystem

s3 = S3FileSystem()  

# save train_dataset to s3
training_input_path = f's3://{sess.default_bucket()}/{s3_prefix}/train'
train_dataset.save_to_disk(training_input_path,fs=s3)

# save test_dataset to s3
test_input_path = f's3://{sess.default_bucket()}/{s3_prefix}/test'
test_dataset.save_to_disk(test_input_path,fs=s3)

from sagemaker.huggingface import HuggingFace

# hyperparameters, which are passed into the training job
hyperparameters={
    'epochs': 3,
    'train_batch_size': 32,
    'model_name': checkpoint
                 }

huggingface_estimator = HuggingFace(entry_point='train.py',
                            source_dir='./scripts',
                            instance_type='ml.p3.2xlarge',
                            instance_count=1,
                            role=role,
                            transformers_version='4.6',
                            pytorch_version='1.7',
                            py_version='py36',
                            hyperparameters = hyperparameters)

huggingface_estimator.fit({'train': training_input_path, 'test': test_input_path})

Now we go to predict. We will use to predict a sentence longer than 512 tokens (this is longer than my max_length, but also longer than the BERT max_length)

predictor = huggingface_estimator.deploy(1,"ml.g4dn.xlarge")

long_sentence = "......" # longer than 512 tokens
sentiment_input= {
   "inputs": long_sentence}
predictor.predict(sentiment_input)

This throw this error:

image

The deployment step is not using my tokenizer, since it were so, all sentences would long 256…but it isn’t. So the question is: How can I force to use my personal tokenizer at inference time?

Training and Inference are two completely different things. You are using the same tokenizer, but not the same configuration.

First of all, it is not possible to predict a longer sentence than 512 with the model you use. Meaning you can either use a model, which supports a longer input sequence, e.g. longofrmer or you can truncate your inputs in advance, so sending only inputs smaller than < 512.

Additionally, you could send in the parameters key of your request configuration to automatically truncate any incoming sequence, meaning the inference pipeline would automatically cut after 512 tokens.

long_sentence = "...." # longer than 512 tokens
sentiment_input= {
   {'inputs':long_sentence,
    'parameters': {'truncation':True}
   }
predictor.predict(sentiment_input)

Yes, I know that is not possible to predict with inputs longer than 512 and this is in fact what complain me, as what I wanted to do, is to use my personal tokenizer on inference time.


long_sentence = "...." # longer than 512 tokens
sentiment_input= {
   {'inputs':long_sentence,
    'parameters': {'truncation':True}
   }
predictor.predict(sentiment_input)

Seems that this solution is pretty helpful. I didn’t know I could customize the input sentence with parameters. Where can I learn more about this customization? I mean, what other parameter I can customize and where is such documentation?

Thank you very much for your time.

You can learn more about the Inference Toolkit here: Deploy models to Amazon SageMaker

In addition to the Hugging Face Inference Deep Learning Containers, we created a new Inference Toolkit for SageMaker. This new Inference Toolkit leverages the pipelines from the transformers library to allow zero-code deployments of models, without requiring any code for pre- or post-processing.

Meaning the parameters key supports all optional parameters of the transformers pipelines: Pipelines — transformers 4.10.0 documentation

Hi again…

I tried the solution on endpoints and it works perfectly, … but I’m having issues with batch transform prediction (this is actually my goal).

The batch transform method requires the data to be predicted be in an s3 jsonl file. So I created a python dictionary like this and save later into jsonl on S3.
python file

[
 {"inputs": "long sentence 1", "parameters": {"trucation": True}},  
 {"inputs": "long sentence 2", "parameters": {"trucation": True}}, 
 {"inputs": "long sentence 3", "parameters": {"trucation": True}}, 
 {"inputs": "long sentence 4", "parameters": {"trucation": True}},
]

saved on:

test.jsonl on S3

{"inputs": "long sentence 1", "parameters": {"trucation": true}} 
{"inputs": "long sentence 2", "parameters": {"trucation": true}}
{"inputs": "long sentence 3", "parameters": {"trucation": true}}
{"inputs": "long sentence 4", "parameters": {"trucation": true}}

be aware that in jsonl “True” becomes “true” and maybe this is the cause of the error?

Then this is my code:


# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data=model_path,  # path to your trained sagemaker model
   role=role, # iam role with permissions to create an Endpoint
   transformers_version="4.6", # transformers version used
   pytorch_version="1.7", # pytorch version used
   py_version="py36", # python version of the DLC
)

# create Transformer to run our batch job
batch_job = huggingface_model.transformer(
    instance_count=1,
    instance_type='ml.p3.2xlarge',
    output_path=upload_path, # we are using the same s3 path to save the output with the input
    strategy='SingleRecord')

# starts batch transform job and uses s3 data as input
batch_job.transform(
    data=s3_file_uri,
    content_type='application/json',    
    split_type='Line')

And I have this error:

I don’t know what can I do for makes this code work…

Hey @Oigres,

We have a separate section for batch transform in the documentation, which contains a YT video and a sample notebook. The notebook shows how you can create the jsonl file for your batch transform.

As far as I know, batch transform works like that, SageMaker is sending each “line” of the jsonl as a normal HTTP Request to the inference toolkit. Meaning each line should be a valid JSON document. so true should be correct.

Could you share your cloudwatch logs? Maybe they are more saying about the error?

Of course, here you go. Thank you in advance!

message
"2021-09-02 16:43:42,225 [INFO ] main com.amazonaws.ml.mms.ModelServer - "
MMS Home: /opt/conda/lib/python3.6/site-packages
Current directory: /
Temp directory: /home/model-server/tmp
Number of GPUs: 1
Number of CPUs: 8
Max heap size: 12949 M
Python executable: /opt/conda/bin/python3.6
Config file: /etc/sagemaker-mms.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8080
Model Store: /.sagemaker/mms/models
Initial Models: ALL
Log dir: /logs
Metrics dir: /logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Preload model: false
Prefer direct buffer: false
"2021-09-02 16:43:42,323 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-9000-model"
"2021-09-02 16:43:42,432 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model_service_worker started with args: --sock-type unix --sock-name /home/model-server/tmp/.mms.sock.9000 --handler sagemaker_huggingface_inference_toolkit.handler_service --model-path /.sagemaker/mms/models/model --model-name model --preload-model false --tmp-dir /home/model-server/tmp"
"2021-09-02 16:43:42,433 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Listening on port: /home/model-server/tmp/.mms.sock.9000"
"2021-09-02 16:43:42,434 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [PID] 48"
"2021-09-02 16:43:42,434 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MMS worker started."
"2021-09-02 16:43:42,434 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Python runtime: 3.6.13"
"2021-09-02 16:43:42,435 [INFO ] main com.amazonaws.ml.mms.wlm.ModelManager - Model model loaded."
"2021-09-02 16:43:42,442 [INFO ] main com.amazonaws.ml.mms.ModelServer - Initialize Inference server with: EpollServerSocketChannel."
"2021-09-02 16:43:42,453 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.mms.sock.9000"
"2021-09-02 16:43:42,525 [INFO ] main com.amazonaws.ml.mms.ModelServer - Inference API bind to: http://0.0.0.0:8080"
"2021-09-02 16:43:42,526 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /home/model-server/tmp/.mms.sock.9000."
Model server started.
"2021-09-02 16:43:42,530 [WARN ] pool-2-thread-1 com.amazonaws.ml.mms.metrics.MetricCollector - worker pid is not available yet."
"2021-09-02 16:43:43,658 [INFO ] pool-1-thread-3 ACCESS_LOG - /169.254.255.130:42100 ""GET /ping HTTP/1.1"" 200 16"
"2021-09-02 16:43:43,670 [INFO ] epollEventLoopGroup-3-2 ACCESS_LOG - /169.254.255.130:42112 ""GET /execution-parameters HTTP/1.1"" 404 1"
"2021-09-02 16:43:47,797 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Model model loaded io_fd=0242a9fffefeff83-00000019-00000001-b754649245650bfe-427692ab"
"2021-09-02 16:43:47,799 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 5210"
"2021-09-02 16:43:47,801 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-model-1"
"2021-09-02 16:43:48,242 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Preprocess time - 0.1277923583984375 ms"
"2021-09-02 16:43:48,242 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Predict time - 1630601028241.526 ms"
"2021-09-02 16:43:48,243 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Postprocess time - 0.20599365234375 ms"
"2021-09-02 16:43:48,243 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 440"
"2021-09-02 16:43:48,243 [INFO ] W-9000-model ACCESS_LOG - /169.254.255.130:42116 ""POST /invocations HTTP/1.1"" 200 4525"
"2021-09-02 16:43:48,309 [WARN ] W-model-1-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Token indices sequence length is longer than the specified maximum sequence length for this model (520 > 512). Running this sequence through the model will result in indexing errors"
"2021-09-02 16:43:48,318 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Prediction error"
"2021-09-02 16:43:48,318 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):"
"2021-09-02 16:43:48,318 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 11"
"2021-09-02 16:43:48,318 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File ""/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py"", line 222, in handle"
"2021-09-02 16:43:48,318 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     response = self.transform_fn(self.model, input_data, content_type, accept)"
"2021-09-02 16:43:48,318 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File ""/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py"", line 181, in transform_fn"
"2021-09-02 16:43:48,319 [INFO ] W-9000-model ACCESS_LOG - /169.254.255.130:42116 ""POST /invocations HTTP/1.1"" 400 13"
"2021-09-02 16:43:48,319 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     predictions = self.predict(processed_data, model)"
"2021-09-02 16:43:48,319 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File ""/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py"", line 147, in predict"
"2021-09-02 16:43:48,319 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     prediction = model(inputs, **parameters)"
"2021-09-02 16:43:48,319 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File ""/opt/conda/lib/python3.6/site-packages/transformers/pipelines/text_classification.py"", line 65, in __call__"
"2021-09-02 16:43:48,319 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     outputs = super().__call__(*args, **kwargs)"
"2021-09-02 16:43:48,320 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File ""/opt/conda/lib/python3.6/site-packages/transformers/pipelines/base.py"", line 676, in __call__"
"2021-09-02 16:43:48,320 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     return self._forward(inputs)"
"2021-09-02 16:43:48,320 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File ""/opt/conda/lib/python3.6/site-packages/transformers/pipelines/base.py"", line 697, in _forward"
"2021-09-02 16:43:48,320 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     predictions = self.model(**inputs)[0].cpu()"
"2021-09-02 16:43:48,320 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File ""/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py"", line 727, in _call_impl"
"2021-09-02 16:43:48,320 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     result = self.forward(*input, **kwargs)"
"2021-09-02 16:43:48,320 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File ""/opt/conda/lib/python3.6/site-packages/transformers/models/bert/modeling_bert.py"", line 1511, in forward"
"2021-09-02 16:43:48,321 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     return_dict=return_dict,"
"2021-09-02 16:43:48,321 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File ""/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py"", line 727, in _call_impl"
"2021-09-02 16:43:48,321 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     result = self.forward(*input, **kwargs)"
"2021-09-02 16:43:48,321 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File ""/opt/conda/lib/python3.6/site-packages/transformers/models/bert/modeling_bert.py"", line 969, in forward"
"2021-09-02 16:43:48,322 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     past_key_values_length=past_key_values_length,"
"2021-09-02 16:43:48,322 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File ""/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py"", line 727, in _call_impl"
"2021-09-02 16:43:48,322 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     result = self.forward(*input, **kwargs)"
"2021-09-02 16:43:48,322 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File ""/opt/conda/lib/python3.6/site-packages/transformers/models/bert/modeling_bert.py"", line 207, in forward"
"2021-09-02 16:43:48,322 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     embeddings += position_embeddings"
"2021-09-02 16:43:48,323 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - RuntimeError: The size of tensor a (520) must match the size of tensor b (512) at non-singleton dimension 1"
"2021-09-02 16:43:48,323 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - "
"2021-09-02 16:43:48,323 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - During handling of the above exception, another exception occurred:"
"2021-09-02 16:43:48,323 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - "
"2021-09-02 16:43:48,323 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):"
"2021-09-02 16:43:48,323 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File ""/opt/conda/lib/python3.6/site-packages/mms/service.py"", line 108, in predict"
"2021-09-02 16:43:48,324 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     ret = self._entry_point(input_batch, self.context)"
"2021-09-02 16:43:48,324 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File ""/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py"", line 231, in handle"
"2021-09-02 16:43:48,324 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     raise PredictionException(str(e), 400)"
"2021-09-02 16:43:48,324 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - mms.service.PredictionException: The size of tensor a (520) must match the size of tensor b (512) at non-singleton dimension 1 : 400"
"2021-09-02 16:43:48,337 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Prediction error"
"2021-09-02 16:43:48,337 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 1"
"2021-09-02 16:43:48,337 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):"
"2021-09-02 16:43:48,338 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File ""/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py"", line 222, in handle"
"2021-09-02 16:43:48,338 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     response = self.transform_fn(self.model, input_data, content_type, accept)"
"2021-09-02 16:43:48,338 [INFO ] W-9000-model ACCESS_LOG - /169.254.255.130:42142 ""POST /invocations HTTP/1.1"" 400 3"
"2021-09-02 16:43:48,338 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File ""/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py"", line 179, in transform_fn"
"2021-09-02 16:43:48,338 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     processed_data = self.preprocess(input_data, content_type)"
"2021-09-02 16:43:48,338 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File ""/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py"", line 127, in preprocess"
"2021-09-02 16:43:48,339 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     decoded_input_data = decoder_encoder.decode(input_data, content_type)"
"2021-09-02 16:43:48,339 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File ""/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/decoder_encoder.py"", line 89, in decode"
"2021-09-02 16:43:48,339 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     return decoder(content)"
"2021-09-02 16:43:48,339 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File ""/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/decoder_encoder.py"", line 34, in decode_json"
"2021-09-02 16:43:48,339 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     return json.loads(content)"
"2021-09-02 16:43:48,340 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File ""/opt/conda/lib/python3.6/json/__init__.py"", line 354, in loads"
"2021-09-02 16:43:48,340 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     return _default_decoder.decode(s)"
"2021-09-02 16:43:48,340 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File ""/opt/conda/lib/python3.6/json/decoder.py"", line 342, in decode"
"2021-09-02 16:43:48,340 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     raise JSONDecodeError(""Extra data"", s, end)"
"2021-09-02 16:43:48,340 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - json.decoder.JSONDecodeError: Extra data: line 1 column 50 (char 49)"
"2021-09-02 16:43:48,341 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - "
"2021-09-02 16:43:48,342 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - During handling of the above exception, another exception occurred:"
"2021-09-02 16:43:48,342 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - "
"2021-09-02 16:43:48,342 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):"
"2021-09-02 16:43:48,342 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File ""/opt/conda/lib/python3.6/site-packages/mms/service.py"", line 108, in predict"
"2021-09-02 16:43:48,342 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     ret = self._entry_point(input_batch, self.context)"
"2021-09-02 16:43:48,342 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File ""/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py"", line 231, in handle"
"2021-09-02 16:43:48,342 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     raise PredictionException(str(e), 400)"
"2021-09-02 16:43:48,342 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - mms.service.PredictionException: Extra data: line 1 column 50 (char 49) : 400"

The Error is pretty clear. It looks like truncate is not applied correctly.
Which model are you using?

I just noticed you have a spelling mistake in truncationtrucation is wrong

1 Like

yes, it seems :sweat_smile:.

I corrected and now it works.

Thanks so much!

1 Like