How are the inputs tokenized when model deployment?

Training and Inference are two completely different things. You are using the same tokenizer, but not the same configuration.

First of all, it is not possible to predict a longer sentence than 512 with the model you use. Meaning you can either use a model, which supports a longer input sequence, e.g. longofrmer or you can truncate your inputs in advance, so sending only inputs smaller than < 512.

Additionally, you could send in the parameters key of your request configuration to automatically truncate any incoming sequence, meaning the inference pipeline would automatically cut after 512 tokens.

long_sentence = "...." # longer than 512 tokens
sentiment_input= {
   {'inputs':long_sentence,
    'parameters': {'truncation':True}
   }
predictor.predict(sentiment_input)
1 Like