loaded_model and loaded_tokenizer have been trained earlier using distilbert.
I’ve tried using TextClassificationPipeline, but unfortunately “email_text” has too many tokens (more than 512) so doesn’t work. If I truncate email_text it returns incorrect prediction_value.
I searched to see if it was some kind of bug or configuration issue, but it seems to be a deep-rooted problem.
By the way, you can pass options to the pipeline for the model and tokenizer in addition to the pipeline options. Check the model description to see what options are available. It would be easy to solve this problem.
Thank you. It haddn’t occured to me that the input could be chunked. Although it’ll make the classification slower, if it makes it more accurate, that’ll be really helpful. I’ll look into it