I am trying to train a ModernBERT model using the HF stack. Since this is for a new language, I will also train the tokenizer (which is the easy part). Since ModernBERT uses dynamic padding and global/local attention, I haven’t seen examples of such support in HF. Am I missing something or should I wait?
1 Like
- Dynamic Padding: This is already supported by the Hugging Face library, so you don’t need to worry about it.
- Global/Local Attention: There is no direct out-of-the-box support for this in Hugging Face at the moment. You will need to subclass and modify the attention mechanism yourself.
- Training a Tokenizer: This is the easy part, as you’ve already pointed out. Ensure that the tokenizer fits your new language and handle any specific tokenization nuances that come with it.
1 Like