I want to create an EncoderDecoderModel for a translation task using a Bert2Bert configuration, where the encoder model is pre-trained and frozen, and the decoder model is randomly initialized. In the BertGeneration document, it says:
We developed a Transformer-based sequence-to-sequence model that is compatible with publicly available pre-trained BERT, GPT-2 and RoBERTa checkpoints and conducted an extensive empirical study on the utility of initializing our model, both encoder and decoder, with these checkpoints.
Can I use a new randomly initialized model in the BertGenerationDecoder? or it is better to use BertGeneration with pre-trained models?
model_config = BertConfig(
vocab_size=tokenizer.vocab_size,
hidden_size=hidden_size,
add_cross_attention=True,
is_decoder=True,
bos_token_id=tokenizer.cls_token_id,
eos_token_id=tokenizer.sep_token_id,
)
model = BertGenerationDecoder(config=model_config)