The accuracy from pretraining is worse than without pretraining

joehuhu · July 11, 2024, 5:33am

My current task is to classify the association between CVEs and CWEs. However, I’ve noticed that using BertModel.from_pretrained('bert-base-uncased') in the fine-tuning stage results in lower accuracy compared to when I pretrain with more CVE-related descriptions first, and then fine-tune using the pretrained model.pt. I don’t understand why this is happening as I have ruled out compatibility issues with the model. It’s worth mentioning that in the pretraining phase, I only use the pretrained model weights for fine-tuning, and the tokenizer is consistently BertTokenizer.from_pretrained('bert-base-uncased'). I did not retrain or expand the tokenizer during pretraining because it is very time-consuming.

Here are the hyperparameters I am using:

batch_size = 16
num_epochs = 10
learning_rate = 1e-4
eps=1e-8
beta1 = 0.9
beta2 = 0.99
weight_decay = 0.01
total_steps = num_epochs * len(train_loader)
warmup_steps = total_steps // 10
early_stopping_patience = 2

Additionally, the settings for masked language modeling (MLM) are:

mask_prob=0.15
replace_mask_prob=0.8
random_replace_prob=0.10
keep_original_prob=0.10

I hope someone can answer my question. If more detailed code is needed, I can provide it. Thank you.

Topic		Replies	Views
The Impact of Pretraining on Fine-tuning and Inference 🤗Transformers	0	55	July 11, 2024
MLM Pretraining Domain Adaption 🤗Transformers	0	38	July 13, 2024
Tips for PreTraining BERT from scratch 🤗Transformers	19	9851	December 10, 2020
Pre-training BERT Models	1	381	May 21, 2024
Is there any difference in the tokenized output if I load the tokenizer from a different pretrained model Beginners	2	382	September 3, 2020

The accuracy from pretraining is worse than without pretraining

Related topics