Difference between EncoderDecoder and BertGeneration

Hey,

i want to train a EncoderDecoder model with BERT as a base model and found this notebook.
In this notebook the model is created by two checkpoints and configured:

bert2bert = EncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-uncased", "bert-base-uncased")

bert2bert.config.decoder_start_token_id = tokenizer.cls_token_id
bert2bert.config.eos_token_id = tokenizer.sep_token_id
bert2bert.config.pad_token_id = tokenizer.pad_token_id
bert2bert.config.vocab_size = bert2bert.config.encoder.vocab_size

bert2bert.config.max_length = 142
bert2bert.config.min_length = 56
bert2bert.config.no_repeat_ngram_size = 3
bert2bert.config.early_stopping = True
bert2bert.config.length_penalty = 2.0
bert2bert.config.num_beams = 4

However, the page for BertGeneration creates a model differently:

encoder = BertGenerationEncoder.from_pretrained(
    "bert-large-uncased",
    bos_token_id=101,
    eos_token_id=102
)
decoder = BertGenerationDecoder.from_pretrained(
    "bert-large-uncased",
    add_cross_attention=True,
    is_decoder=True,
    bos_token_id=101,
    eos_token_id=102
)
bert2bert = EncoderDecoderModel(encoder=encoder, decoder=decoder)

What is the exact difference between these models?
If I want to train the latter model i need to remove the token_type_ids column from my dataset to avoid this error:
TypeError: BertGenerationEncoder.forward() got an unexpected keyword argument 'token_type_ids'

Thanks in advance!
julian

CC @patrickvonplaten