Using Trainer at inference time

Considering efficiency, the Trainer should be perfectly fine. You may wish to handle some specific optimisations though. See this post: Faster and smaller quantized NLP with Hugging Face and ONNX Runtime | by Yufeng Li | Microsoft Azure | Medium