Tokenizer vs Model

NaimaVahab · June 24, 2024, 2:47am

From what I have understood from the hf-tutorial is that we should use a pretrained model with its own tokenizer for a good performance. My doubt is that, BERT uses wordpiece but RoBERTA (again BERT architecture) uses BPE as tokenization approaches. Can we mix match any model and tokenizer if we are pretraining the model from scratch, like in the case of RoBERTA ? In that case, can I pretrain the BERT/DistiBert model from scratch using BPE/Unigram tokenizer?

Is the rule of using same model with same tokenizer applicable on finetuning or inference purpose only ? Or is the architecture of each model itself is related to the tokenizing approach ?

I am trying to train a distilbert using unigram approach.

Topic		Replies	Views
Can we use tokenizer from one architecture and model from another one? Beginners	2	876	September 30, 2021
Performance difference between ByteLevelBPE and Wordpiece tokenizers 🤗Tokenizers	0	685	September 22, 2021
Is there any documentation on the parameter that can be passed to the tokenisors via from_pretrained Beginners	0	172	October 1, 2022
Custom tokenizer: finetune model or retrain model? 🤗Transformers	1	941	March 8, 2024
Further pre-training the tokenizer? 🤗Tokenizers	0	822	April 30, 2022

Tokenizer vs Model

Related topics