Hi everyone,
I wanted to know how would we finetune mBART on a summarization task on a different language than that of English. Also, how can we finetune mBART on a translation task where one of the languages is not present in the language code list that mBART has been trained on.
Appreciate any help!! Thank you.
Hi @LaibaMehnaz
DISCLAIMER: I havenât tried this myself , and as Sam found in his experiments mBART doesnât always give good results.
for mBart source seq ends with src lang id and tgt seq starts with the tgt seq id. so for summ you can pass the same lang id for both source and tgt lang and then finetune it the same way you finetune any other seq2seq model.
For translation, if the lang is not present, you can try without using any lang id in the sequences.
Hi @valhalla,
I did try using mBart without any lang id, but it gives me this error:
self.cur_lang_code = self.lang_code_to_id[src_lang]
KeyError: ââ
Also, when I am using the same language code on both the sides, the generations are in a totally different script.
For this, youâll need tokenize input and output seq without using prepare_seq2seq_batch
method or override prepare_seq2seq_batch
and modify it to not use lang id
I modified the tokenizer to not use the lang id as you suggested, but still the same problem. ROUGE is 0.0, as the generations are in another script.
Also, is it possible because I am using tiny-mbart and not mbart-large-cc25. I was trying out tiny-mbart due to memory constraints.
hi @LaibaMehnaz
tiny-mbart
is just meant for testing, itâs randomly initialised model.
You could try to create a smaller student using make_student.py
script.
Oh, thanks a lot. I will proceed this way and let you know. Thanks again:)
Also, how many encoder and decoder layers would you suggest?
hard to say, depends on problem. But you could start with same number of encoder layers and 6 decoder layers distillbart-12-6 performs really well on summarization
Alright, thank you so much.
Hi, I am also interested in topic and I am trying to add mBart functionality to another library, but I have encountered strange error: https://huggingface.co/transformers/model_doc/mbart.html states that prepare_seq2seq_batch
should give me dict with this keys: [input_ids, attention_mask, decoder_input_ids, decoder_attention_mask]
, but actually it gives me [input_ids, attention_mask, labels]
. I am a bit confused
Is it a bug or me doing something wrong?
Hi @Zhylkaaa
The doc is incorrect. The prepare_seq2seq_batch
returns [input_ids, attention_mask, labels]
, and itâs not a bug
cc. @sshleifer
Hi @valhalla
thanks for your respond, but how I am supposed to create decoder inputs? because there is difference in lang_id position
should I use something like:
[lang_id] + prepare_seq2seq_batch(decoder_input)['input_ids'][:-1]
+ padding if required?
Or should I just modify prepare_seq2seq_batch
throwing away lang_id for summarisation task? (I am not sure about this modification because my intuition tells me that lang_id is something like language conditioned [CLS] token, or my intuition is wrong againđ?)
Thanks!
I have read that I can tag @sshleifer for summarisation and BART problems/questions. Sorry if I am wrong.
You can keep lang id for summarisation, you could pass the same lang id as src_lang
and tgt_lang
to prepare_seq2seq_batch
method
Thank you @valhalla,
but what about decoder_input_ids
? because I doesnât receive this value after I use prepare_seq2seq_batch
finetune.py
and finetune_trainer.py
will make the right right deocder_input_ids
, you wonât need to pass them
thanks, actually Iâve been digging through source code and found that forward
method actually generates decoder_input_ids
from labels
through shift_tokens_right
. Thank you for help, and sorry for being annoying, should have checked source code first