Hi all! I have DistilBert fine-tuned for a sequence classification task, and am struggling with using the fine-tuned model to classify large batches of input.
Tokeniser runs very quickly (2k it/s) but actually applying the model to tokenized input is very slow (15 it/s).
Does anyone know of resources / best practices on how to optimise performance?