How to optimise transformer speed for batches of inputs?

dom · June 7, 2021, 6:34pm

Hi all! I have DistilBert fine-tuned for a sequence classification task, and am struggling with using the fine-tuned model to classify large batches of input.

Tokeniser runs very quickly (2k it/s) but actually applying the model to tokenized input is very slow (15 it/s).

Does anyone know of resources / best practices on how to optimise performance?

Topic		Replies	Views
Model Performance and Sanity check Intermediate	0	356	March 7, 2024
Why is using my DistilBERT model for inference so slow? Intermediate	0	921	June 18, 2021
Text classification on small dataset (8K) Intermediate	1	896	July 27, 2021
Speed issues using tokenizer.train_new_from_iterator on ~50GB dataset 🤗Transformers	7	2242	November 11, 2024
Extremely slow init of fine-tuned model Beginners	0	277	February 9, 2024

How to optimise transformer speed for batches of inputs?

Related topics