Hi! I’m currently working on tuning GPT-2 for multilingual to English translation tasks, my first thought is to use an XLM-RoBERTa as encoder and GPT-2 as decoder, feeding GPT-2 embeddings of multilingual texts processed by XLM-R, like this:
class XLM2GPT2(nn.Module):
def __init__(self, config):
super().__init__()
self.encoder = XLMRobertaModel.from_pretrained(config['encoder'])
self.decoder = GPT2LMHeadModel.from_pretrained(config['decoder'])
def forward(self, ids, attns, labels):
txt_embeds = self.encoder(ids, attns)
output = self.decoder(input_embeds=txt_embeds, labels=labels)
return output.loss
I don’t know if this is the right way to build a model and do generative training, is a structure like EncoderDecoderModel
a better way to implement it?
Thanks in advance!