I want to create an
EncoderDecoderModel for a translation task using a
Bert2Bert configuration, where the encoder model is pre-trained and frozen, and the decoder model is randomly initialized. In the
BertGeneration document, it says:
We developed a Transformer-based sequence-to-sequence model that is compatible with publicly available pre-trained BERT, GPT-2 and RoBERTa checkpoints and conducted an extensive empirical study on the utility of initializing our model, both encoder and decoder, with these checkpoints.
Can I use a new randomly initialized model in the
BertGenerationDecoder? or it is better to use
BertGeneration with pre-trained models?
model_config = BertConfig( vocab_size=tokenizer.vocab_size, hidden_size=hidden_size, add_cross_attention=True, is_decoder=True, bos_token_id=tokenizer.cls_token_id, eos_token_id=tokenizer.sep_token_id, ) model = BertGenerationDecoder(config=model_config)