ModernBERT Pretraining using HuggingFace API

akhooli · January 5, 2025, 11:32am

I am trying to train a ModernBERT model using the HF stack. Since this is for a new language, I will also train the tokenizer (which is the easy part). Since ModernBERT uses dynamic padding and global/local attention, I haven’t seen examples of such support in HF. Am I missing something or should I wait?

ValdeJunior · January 6, 2025, 7:25am

Dynamic Padding: This is already supported by the Hugging Face library, so you don’t need to worry about it.
Global/Local Attention: There is no direct out-of-the-box support for this in Hugging Face at the moment. You will need to subclass and modify the attention mechanism yourself.
Training a Tokenizer: This is the easy part, as you’ve already pointed out. Ensure that the tokenizer fits your new language and handle any specific tokenization nuances that come with it.

Padajno · March 17, 2025, 8:26am

An updates on this?

John6666 · March 17, 2025, 11:34am

This page?

Blog article from last year.

Topic		Replies	Views
ModernBERT MaskedLM nan training loss Models	7	580	January 27, 2025
deBERTa v3 implementation in HuggingFace (with RTD training) 🤗Transformers	5	330	July 12, 2025
SpanBERT, ELECTRA, MARGE from scratch? Beginners	5	1379	July 22, 2023
No dynamic sized input with huggingface-transformers ALBERT and TFjs Intermediate	0	1013	October 1, 2020
Is there a Hugging Face (HF) model API for inference that is uniform with HF models and the Open AI interface? Models	1	917	July 14, 2023

ModernBERT Pretraining using HuggingFace API

Related topics