Runtime of XLM-R model is too high

I have finetuned XLM-RoBERTa on text classification dataset. I am finetuning model on Tensorflow-keras. I have finetuning model on google colab gpu and testing model on google colab cpu. I have used below methods to save, load and run the model.


loaded_model = TFRobertaForSequenceClassification. from_pretrained(’/content/drive/MyDrive/trained_model’)

model.predict((encoded_dict[‘input_ids’], encoded_dict[‘attention_mask’]))

This works well but the inference time for a single document is 3.6 seconds which is too high. How can I make model run faster?