Mismatched Tokenizer and LLM leading to odd evaluation result

kbmmoran · May 18, 2023, 2:52pm

Apologies, two part question.

Part A) I have a trained tokenizer and a trained LLM (from roberta-base). The LLM was trained using the trained tokenizer, however, the tokenizer is trained from a BERT tokenizer. Will the BERT/RoBERTa mismatch cause issues/loss of accuracy down the line?

Part B) My trained LLM is underperforming the out-of-the box RoBERTa model slightly after 3 epochs of training on ~15 millions examples with val loss (1.25 val loss vs. 1.30 val loss). However, anecdotal evidence from MLM examples, the trained model seems to have learned the specific language and is performing better. Why through the trainer.evaluate(), would the out-of-the-box model still have a lower loss? Is this just not simply enough epochs for a big dataset or could this be related to part A? Could catastrophic forgetting be to blame and that’s why for examples I care about the custom LLM is outperforming RoBERTa but RoBERTa has a lower val loss?

Topic		Replies	Views
RoBERTa MLM fine-tuning Beginners	1	1874	November 24, 2021
Domain adaptation of Language Model and Tokenizer Beginners	8	2879	June 17, 2024
Getting the MLM accuracy for the BERT model I am training from scratch Beginners	7	5363	October 5, 2023
Domain adaptation for embeddings - fine tuning on MLM Beginners	2	493	July 12, 2024
RoBERTa from scratch with different vocab vs. fine-tuning Intermediate	9	2230	August 20, 2020

Mismatched Tokenizer and LLM leading to odd evaluation result

Related topics