I recently begun using a RobertaLarge model, which I perform a down stream Training, using âTrainerâ package.
All goes well, I see the loss going down, and compare manually some results with valid dataset.
Problem goes when I try to save the model and reload it afterwards.
I keep seeing the warning when trying to reload the model:
Some weights of the model checkpoint at Roberta_trained_1epoch were not used when initializing RobertaPreTrainedModel: [âmodule.roberta.encoder.layer.10.output.dense.biasâ, [âŚ340_LAYERS_âŚ]
âmodule.roberta.encoder.layer.6.attention.self.key.biasâ, âmodule.roberta.encoder.layer.22.output.dense.weightâ, âmodule.roberta.encoder.layer.3.attention.self.key.biasâ, âmodule.roberta.encoder.layer.15.attention.self.value.biasâ, âmodule.roberta.encoder.layer.15.attention.self.query.biasâ, âmodule.roberta.encoder.layer.2.attention.self.value.biasâ]
I looked extensively for an answer to why this problem, and so far couldnât find a solution. Some claim this is just a warning and thereâs nothing wrong, however suspiciously I did some manual checks, and indeed the model seems⌠virgin.
Iâm using the: Trainer.save_model('save_here')
after training, and using the RobertaForTokenClassification.from_pretrained('save_here', local_files_only=True)
model to reload it.
However the results show me that the model is not loading currently clearly.
training code:
trainer = Trainer(
model=model,
args=training_args,
compute_metrics=compute_metrics,
train_dataset=ds_train,
eval_dataset=ds_valid,
callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],
)
trainer.train()
trainer.evaluate()
trainer.save_model('save_here')
this results in evaluation loss of: 0.002
Reloading and re-evaluation:
model = RobertaForTokenClassification.from_pretrained('save_here', local_files_only=True)
tokenizer = AutoTokenizer.from_pretrained('tokenizers_saved')
dl_valid = DataLoader(ds_valid, batch_size=Config.batch_size, shuffle=True)
with torch.no_grad():
for index, data in enumerate(dl_valid):
batch_input_ids = data['input_ids'].to(device, dtype=torch.long)
batch_att_mask = data['attention_mask'].to(device, dtype=torch.long)
batch_target = data['label_ids'].to(device, dtype=torch.long)
output = model(batch_input_ids, token_type_ids=None, attention_mask=batch_att_mask, labels=batch_target)
step_loss, eval_prediction = output['loss'], output['logits']
eval_prediction = np.argmax(eval_prediction.detach().to('cpu').numpy(), axis=2)
predictions.append(eval_prediction)
reals.append(batch_target)
eval_loss += step_loss
print(eval_loss)
This results in loss: 1.2 - 0.9 (randomly after loading)