Training and Inference are two completely different things. You are using the same tokenizer, but not the same configuration.
First of all, it is not possible to predict a longer sentence than 512
with the model you use. Meaning you can either use a model, which supports a longer input sequence, e.g. longofrmer or you can truncate your inputs in advance, so sending only inputs smaller than < 512.
Additionally, you could send in the parameters
key of your request configuration to automatically truncate any incoming sequence, meaning the inference pipeline would automatically cut after 512 tokens.
long_sentence = "...." # longer than 512 tokens
sentiment_input= {
{'inputs':long_sentence,
'parameters': {'truncation':True}
}
predictor.predict(sentiment_input)