Hi Bachstelze,
Thanks for your interest! From what I can see in the GitHub issue, the challenge isn’t with the encoder-decoder architecture itself (which is what my tutorial covers), but rather with ModernBERT’s specific implementation in the Hugging Face library. As Niels Rogge pointed out, ModernBERT currently doesn’t support cross-attention, which is needed for encoder-decoder models.
If you’re looking to use ModernBERT specifically, you’d need to either:
- Wait for cross-attention support to be added to ModernBERT in the transformers library, or
- Consider using another BERT variant that already supports cross-attention
If you’re interested in understanding how cross-attention works in encoder-decoder models, my tutorial might help explain the mechanics, even though it doesn’t specifically address ModernBERT implementation.