How to make single-input inference faster? Create my own pipeline?

Sorry for using my alt.

What I mean, is you need to check that you are using your GPU at 100% (nvidia-smi -l 1)

Could you instrument your function by printing times at each step, the result of the slowdown might come out clearer.