Hi Nasheed, I’m quite curious about your use case and why you’re interested in never partially truncating, if you don’t mind sharing!
In any case, here is how I would do it: Increase max_length
by 1. Tokenize the text. Decode the tokenized text. Check if the second to last token (the one before the final [CLS] token) starts with ##
(the prefix that signifies that a longer token was split). If yes, remove both tokens, the one that starts with ##
and the one before that. If not, just remove the one before the [CLS] token.
In your example it would be
[CLS] I am Nasheed and I like xylo ##phones [CLS]
Because the second to last token starts with ##
you would remove that token and the token before it.
Hope that helps.
Cheers
Heiko