Translation takes too long - from fine-tuned mbart-large-50 model

Aloka · September 7, 2021, 5:09am

I have fine-tuned from, facebook mbart-large-50 for Si-En language pairs. When I try to translate 1950 sentences as (1) full batch (2) batch size=16 etc. still process crashes.

I then passed 16-lines per batch, ie. as src_lines and it takes considerable time.

Could you help on how I can reduce the translation time? My code is as follows.

Highly appreciate your help.

However from the fairseq fine-tuned checkpoint the entire file can be translated in 2 mints in the same machine.

tokenizer = MBart50TokenizerFast.from_pretrained(“mbart50-ft-si-en-run4”, src_lang=“si_LK”, tgt_lang=“en_XX”)
src_lines=[line.strip() for line in open(‘data/parallel-27.04.2021-tu.un.sample10.si-en-ta.si’, ‘r’, encoding=‘utf8’)] #there are 1950 lines
model_inputs = tokenizer(src_lines, padding=True, truncation=True, max_length=100, return_tensors=“pt”)

generated_tokens = model.generate(
**model_inputs,
forced_bos_token_id=tokenizer.lang_code_to_id[“en_XX”])

trans_lines=tokenizer.batch_decode(generated_tokens, skip_special_tokens=True) #crashes

Topic		Replies	Views
Fine-tuning for translation with facebook mbart-large-50 🤗Transformers	1	1729	March 16, 2024
MBART50 .generate() is very slow Beginners	0	660	July 21, 2021
Increase the speed of the Mbart model Beginners	1	646	September 28, 2023
Issue with MBart50 translation Beginners	2	622	February 24, 2021
Mbart finetuning Models	0	676	July 29, 2021

Translation takes too long - from fine-tuned mbart-large-50 model

Related topics