Hello guys,
I’ve trained BERT model from the scratch using BertForMaskedLM and trainers. When I use AutoModelForSequenceClassification to fine-tune my model for a text classification task, I get a warrning about weights initialization. Is it normal to get a warning such as in the below or am I doing something wrong ?
Some weights of the model checkpoint at ./cased/bert-wikidump-50mb-mlm/model were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at ./cased/bert-wikidump-50mb-mlm/model and are newly initialized: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'classifier.weight', 'classifier.bias']
Loading pre-trained model with AutoModelForSequenceClassification
from transformers import AutoModelForSequenceClassification, AdamW, AutoConfig
config = AutoConfig.from_pretrained(PATHS["model"]["cased"]["local"], num_labels=df.category.unique().size)
model = AutoModelForSequenceClassification.from_pretrained(PATHS["model"]["cased"]["local"], config=config)
Code for training BERT from scratch with only MLM task
from transformers import BertConfig
config = BertConfig(vocab_size=64_000)
from transformers import BertForMaskedLM
model = BertForMaskedLM(config=config)
from transformers import Trainer, TrainingArguments
from transformers import DataCollatorForLanguageModeling
data_collator = DataCollatorForLanguageModeling(
tokenizer=bert_cased_tokenizer, mlm=True, mlm_probability=0.15
)
training_args = TrainingArguments(
output_dir=PATHS["model"]["cased"]["training"]["local"],
overwrite_output_dir=True,
num_train_epochs=2,
per_gpu_train_batch_size= 8, ## 512 max sequence lenght, 64 sequence count
save_steps=10_000,
save_total_limit=2,
prediction_loss_only=True,
)
trainer = Trainer(
model=model,
args=training_args,
data_collator=data_collator,
train_dataset=dataset,
)