@patrickvonplaten
In order to implement the PEGASUS pretraining objective ourselves, could we follow the same approach you suggested for mBART ?
This means by adapting to the objective presented in the paper it would become:
from transformers import PegasusTokenizer, PegasusForConditionalGeneration, PegasusConfig
tok = PegasusTokenizer.from_pretrained("google/pegasus")
model = PegasusForConditionalGeneration(PegasusConfig())
input_string = ["Pegasus is <mask_2> . <mask_1> it <mask_2> the model ."
decoder_input_string = "<s> It is pure white ."
labels_string = "It is pure white . <eos>"
input_ids = tok(input_string, add_special_tokens=False, return_tensors="pt").input_ids
decoder_input_ids =tok(decoder_input_string, add_special_tokens=False, return_tensors="pt").input_ids
labels = tok(labels_string, add_special_tokens=False, return_tensors="pt").input_ids
loss = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids, labels=labels)[0]
Naturally, to automate it for pretraining one should implement the mask selection procedure of the dataset (top ROUGE sentences).
Then, the loss we compute would be put in the method _step.
Is this reasonable or I’m missing something ?