I want to create an EncoderDecoderModel
for a translation task using a Bert2Bert
configuration, where the encoder model is pre-trained and frozen, and the decoder model is randomly initialized. In the BertGeneration
document, it says:
We developed a Transformer-based sequence-to-sequence model that is compatible with publicly available pre-trained BERT, GPT-2 and RoBERTa checkpoints and conducted an extensive empirical study on the utility of initializing our model, both encoder and decoder, with these checkpoints.
Can I use a new randomly initialized model in the BertGenerationDecoder
? or it is better to use BertGeneration
with pre-trained models?
model_config = BertConfig(
vocab_size=tokenizer.vocab_size,
hidden_size=hidden_size,
add_cross_attention=True,
is_decoder=True,
bos_token_id=tokenizer.cls_token_id,
eos_token_id=tokenizer.sep_token_id,
)
model = BertGenerationDecoder(config=model_config)