Speeding up the inference for marian MT

yashugupta786 · December 2, 2020, 7:00am

Inference for machine translation task using a pretrained model is very slow .Is there a way to speed up the inference using the marian-mt running the tokenizer and model on Nvidia gpu integrated with a flask service .

What is the best mechanism(string or complete paragraph or beam search to the Marian MT pretrained model)

please find the below code which i am using

model_name = ‘Helsinki-NLP/opus-mt-ROMANCE-en’
torch_device = ‘cuda’ if torch.cuda.is_available() else 'cpu’
tokenizer = MarianTokenizer.from_pretrained(model_name)
print(tokenizer.supported_language_codes)
model = MarianMTModel.from_pretrained(model_name) .to(torch_device)
translated = model.generate(**tokenizer.prepare_translation_batch(src_text).to(‘cuda’))
tgt_text = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]

prashanth · June 10, 2022, 1:38pm

Hi,
Were you able to speed up the inference?

guillaumekln · June 10, 2022, 1:57pm

Hi,

You can consider using the CTranslate2 library which can convert and run MarianMT models efficiently (up to 6x faster than Transformers on a NVIDIA Tesla T4). See a usage example here.

Disclaimer: I’m the author of CTranslate2.

prashanth · June 10, 2022, 7:16pm

Hi @guillaumekln,
I see that in your example for Bart, you pass a tuple. Is there an option on the size? or do we have an option for the number of sentences/tokens that could be passed as a parameter?

glancioni · April 8, 2024, 9:48am

Hello.

I tried to convert a fine-tuned MarianMT Transformers model by CTranslate2 and, while the conversion works fine, the output is remarkably different from Transformers Pipeline. I wonder why, since in both cases all information is within the model folder and no additional parameter is supplied to both.

Any hint? Perhaps default MarianMT Transformers configuration parameters not explicitly listed in config.json file?

Thank you.

Topic		Replies	Views
NLP Pretrained model model doesn’t use GPU when making inference 🤗Transformers	11	10124	March 11, 2022
Slow inference for translation Beginners	0	180	April 22, 2024
Using MarianModel's in pytorch is too slow to do back translation (not parallelised correctly) Beginners	8	2119	December 22, 2020
Boosting the speed of a translation model Helsinki-NLP/opus-mt-en-ar 🤗Transformers	0	734	October 2, 2023
Fast tokenizer for marianMTModel 🤗Tokenizers	1	513	September 26, 2022

Speeding up the inference for marian MT

Related topics