How to make single-input inference faster? Create my own pipeline?

+1 to what @Narsil said, I was just going to suggest the golden rule of optimizing: measure first! Measure how long each piece of code takes, over a few runs with a few different configurations (input lengths, how many you’re predicting, that kind of thing). Then you’ll know where you can best focus your efforts.

I totally get that it’s annoying to measure. I also often drag my feet before doing this. But I’m always glad I did. Otherwise, you might spend a bunch of effort speeding up one part a tiny bit, when the bottleneck is actually somewhere else!