I created a sagemaker serverless endpoint that serves a fine-tuned text classification model… Now, when I try to invoke it with a sequence length longer than the maximum input length (514) it correctly returns the following error:
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from model with message "{
"code": 400,
"type": "InternalServerException",
"message": "The expanded size of the tensor (997) must match the existing size (514) at non-singleton dimension 1. Target sizes: [1, 997]. Tensor sizes: [1, 514]"
}
To make sure that the model can handle any input length through truncation, I updated the models tokenizer_config.json with an additional argument "model_max_length": 514 but unfortunately the error remains the same.
Am I working on the wrong part of the model? Do I have to set it in tokenizer.json?
Hi @philschmid! No, currently I don’t.
I know that this solution exists but I’m wondering whether there is a way to configure the model itself to apply truncation by default.
The intuition behind it is to make the model available in the most simple way. So, my users should not worry about adding “nlp specific” parameters to their requests.
I’ll give truncation a try, just to see if that would be a workaround. Could I also pass max_length in the request?
data = {
"parameters": {"truncation": true},
"inputs": "Text longer than 514 Tokens"
}
res = predictor.predict(data=data)
print(res)
and it works as expected
Is there a way to make this the default behavior through the tokenizer config files?
I’d like to make my model available via an AWS API Gateway and to keep things separated my users should not worry about NLP specific topics like truncation and so on.