I have finetuned XLM-RoBERTa on text classification dataset. I am finetuning model on Tensorflow-keras. I have finetuning model on google colab gpu and testing model on google colab cpu. I have used below methods to save, load and run the model.
loaded_model = TFRobertaForSequenceClassification. from_pretrained(’/content/drive/MyDrive/trained_model’)
This works well but the inference time for a single document is 3.6 seconds which is too high. How can I make model run faster?