Using GPT's BPE tokenizer for BERT?


I’m training a custom vocab to train a BERT from scratch, and I was wondering if it would make sense to train a GPT-style BPE tokenizer and use a BertModel.

Has anyone done this kind of training with mismatched tokenizer and model types?
I’d appreciate any insights!