Tutorial: Implementing Transformer from Scratch - A Step-by-Step Guide

bird-of-paradise · December 26, 2024, 1:35am

Hi Bachstelze,
Thanks for your interest! From what I can see in the GitHub issue, the challenge isn’t with the encoder-decoder architecture itself (which is what my tutorial covers), but rather with ModernBERT’s specific implementation in the Hugging Face library. As Niels Rogge pointed out, ModernBERT currently doesn’t support cross-attention, which is needed for encoder-decoder models.

If you’re looking to use ModernBERT specifically, you’d need to either:

Wait for cross-attention support to be added to ModernBERT in the transformers library, or
Consider using another BERT variant that already supports cross-attention

If you’re interested in understanding how cross-attention works in encoder-decoder models, my tutorial might help explain the mechanics, even though it doesn’t specifically address ModernBERT implementation.

Topic		Replies	Views
Training ModernBert+GPT2 Beginners	4	283	January 16, 2025
How to make pure transformer model Beginners	0	140	May 22, 2024
How to build and evaluate a vanilla transformer? Models	0	134	June 26, 2024
Difference between transformer encoder and decoder Models	1	11839	March 12, 2021
Training issue of a Transformer based Encoder-Decoder model based on pre-trained BanglaBERT Models	1	743	May 12, 2022

Tutorial: Implementing Transformer from Scratch - A Step-by-Step Guide

Related topics