Translation takes too long - from fine-tuned mbart-large-50 model

I have fine-tuned from, facebook mbart-large-50 for Si-En language pairs. When I try to translate 1950 sentences as (1) full batch (2) batch size=16 etc. still process crashes.

I then passed 16-lines per batch, ie. as src_lines and it takes considerable time.

Could you help on how I can reduce the translation time? My code is as follows.

Highly appreciate your help.

However from the fairseq fine-tuned checkpoint the entire file can be translated in 2 mints in the same machine.

tokenizer = MBart50TokenizerFast.from_pretrained(“mbart50-ft-si-en-run4”, src_lang=“si_LK”, tgt_lang=“en_XX”)
src_lines=[line.strip() for line in open(‘data/parallel-27.04.2021-tu.un.sample10.si-en-ta.si’, ‘r’, encoding=‘utf8’)] #there are 1950 lines
model_inputs = tokenizer(src_lines, padding=True, truncation=True, max_length=100, return_tensors=“pt”)

generated_tokens = model.generate(
**model_inputs,
forced_bos_token_id=tokenizer.lang_code_to_id[“en_XX”])

trans_lines=tokenizer.batch_decode(generated_tokens, skip_special_tokens=True) #crashes