Hello,
I’m training a custom vocab to train a BERT from scratch, and I was wondering if it would make sense to train a GPT-style BPE tokenizer and use a BertModel.
Has anyone done this kind of training with mismatched tokenizer and model types?
I’d appreciate any insights!