Hi I have noticed that inference time is very quick if running the model on one batch. However, once inference is ran in a loop - even if on the same input - it slows down significantly.
I have actually seen the same behaviour on tensorflow models. Is an expected behaviour or is an issue with cuda etc.
Please find the notebook to see the issue
https://colab.research.google.com/drive/1gqSzQqFm8HL0OwmJzSRlcRFQ3FOpnvFh?usp=sharing