How to perform fast batch inference for NLLB Model translation?

reichenbach · January 13, 2023, 9:59am

Batch Inference of NLLB Models with different source languages.

Need help in inferencing NLLB models for batch inference where the source language can change. Pipeline inference is slow even on GPU. 10k examples of various languages (simple example inference) - 6 hours, batch inference of 4.5k examples of one language took 1.6 hrs.

Single GPU - Tesla T4

guillaumekln · January 30, 2023, 5:12pm

You can have a look at CTranslate2 which is a Python library for fast inference. It can convert NLLB models from Transformers. See an example here.

Disclaimer: I’m the author of CTranslate2.

reichenbach · February 5, 2023, 4:42pm

@guillaumekln Thanks for the reply. Awesome repo, I looked around and found it useful in understanding translation steps. But can you share an example for translation of 50/100 examples in one go and I can time it as well. I changed your given example in the repo for a batching perspective but couldn’t figure out that problem.

guillaumekln · February 6, 2023, 8:11am

Maybe this other tutorial is more helpful:

LiPengtao12138 · September 3, 2024, 7:25am

Do you have any new ideas regarding this issue now

Topic		Replies	Views
Is Facebook NLLB too slow? Models	8	1792	August 30, 2024
Multilingual Neural Machine Translation (MNMT) inference time Beginners	5	543	March 13, 2022
Slow inference for translation Beginners	0	180	April 22, 2024
[Help] GPU with query answering 🤗Transformers	0	328	November 25, 2020
What's the best way to speed up inference on a large dataset? Beginners	3	3905	March 13, 2022

How to perform fast batch inference for NLLB Model translation?

Batch Inference of NLLB Models with different source languages.

Related topics