How can I pretrain a new model re-initializing with my own vocab?

woo · May 25, 2021, 3:14pm

I want to pretrain with BERT model defined in HuggingFace transformer maintaining all the architectural details and config stuffs, but re-initializing the vocabs (removing pre-defined ones) and also re-initializing the weights.

Definitely speaking, I want to change tokenize method and vocabs but want to still use BERT architectures. Is it possible? II couldn’t find any useful stuffs related to it. So much thx!!

Topic		Replies	Views
Further pre-train language model in transformers like BERT Models	3	1108	March 27, 2022
How to create a Huggingface tokenizer from a non-Huggingface tokenizer? 🤗Tokenizers	0	519	May 4, 2021
Further Pretrain Basic BERT for sequence classification 🤗Transformers	4	1799	October 9, 2020
Load pretrained model's tokenizer with or without vocabulary? Beginners	2	145	August 30, 2024
Train a transformer from scratch 🤗Transformers	0	433	August 9, 2021

How can I pretrain a new model re-initializing with my own vocab?

Related topics