Performance difference between ByteLevelBPE and Wordpiece tokenizers

dexhrestha · September 22, 2021, 2:13am

I pretrained two models for Nepali language which is in devnagari script and similar to hindi, first model was a DistillBertmodel with Wordpiece tokenizer and second was Roberta model with ByteLevelBPE tokenizer. For the first model i used oscar nepali dataset which is relatively small dataset. Even for first 50000 optimization steps the model was performing really well, it could predict words based on the context. However for the second model with ByteLevelBPE tokenizer I used a bigger dataset with almost 700k lines. But for this model it is not performing as well as the DistillBert model. So coming to my question. I have few questions about the tokenizer and model.

Wordpiece tokenizer normalized the words in the sentence. Why did this happen?
Is Distillbert a better model for pre training than roberta or is it due to the tokenizer that i am getting bad results for roberta model? Because while studying about models Roberta was said to have higher parameters and better model than distillbert.

Topic		Replies	Views
Tokenizer vs Model 🤗Tokenizers	0	251	June 24, 2024
What is based model of XLM-RoBERTa Tokenizer? SenetencePiece? XLNetTokenizer 🤗Tokenizers	0	32	September 12, 2024
Domain adaptation of Language Model and Tokenizer Beginners	8	2869	June 17, 2024
Why do different tokenizers use different vocab files? 🤗Transformers	0	1793	October 18, 2020
Smaller RoBERTa model Beginners	1	822	July 10, 2020

Performance difference between ByteLevelBPE and Wordpiece tokenizers

Related topics