How to perform fast batch inference for NLLB Model translation?

Batch Inference of NLLB Models with different source languages.

Need help in inferencing NLLB models for batch inference where the source language can change. Pipeline inference is slow even on GPU. 10k examples of various languages (simple example inference) - 6 hours, batch inference of 4.5k examples of one language took 1.6 hrs.

Single GPU - Tesla T4