Not able to reload all weights after training

I recently begun using a RobertaLarge model, which I perform a down stream Training, using “Trainer” package.
All goes well, I see the loss going down, and compare manually some results with valid dataset.

Problem goes when I try to save the model and reload it afterwards.
I keep seeing the warning when trying to reload the model:

Some weights of the model checkpoint at Roberta_trained_1epoch were not used when initializing RobertaPreTrainedModel: [‘module.roberta.encoder.layer.10.output.dense.bias’, […340_LAYERS_…]
‘module.roberta.encoder.layer.6.attention.self.key.bias’, ‘module.roberta.encoder.layer.22.output.dense.weight’, ‘module.roberta.encoder.layer.3.attention.self.key.bias’, ‘module.roberta.encoder.layer.15.attention.self.value.bias’, ‘module.roberta.encoder.layer.15.attention.self.query.bias’, ‘module.roberta.encoder.layer.2.attention.self.value.bias’]

I looked extensively for an answer to why this problem, and so far couldn’t find a solution. Some claim this is just a warning and there’s nothing wrong, however suspiciously I did some manual checks, and indeed the model seems… virgin.

I’m using the: Trainer.save_model('save_here') after training, and using the RobertaForTokenClassification.from_pretrained('save_here', local_files_only=True)model to reload it.

However the results show me that the model is not loading currently clearly.

training code:

trainer = Trainer(
    model=model,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=ds_train,
    eval_dataset=ds_valid,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],
)
trainer.train()
trainer.evaluate()
trainer.save_model('save_here')

this results in evaluation loss of: 0.002

Reloading and re-evaluation:

model = RobertaForTokenClassification.from_pretrained('save_here', local_files_only=True)
tokenizer = AutoTokenizer.from_pretrained('tokenizers_saved')
dl_valid = DataLoader(ds_valid, batch_size=Config.batch_size, shuffle=True)

with torch.no_grad():
    for index, data in enumerate(dl_valid):
        batch_input_ids = data['input_ids'].to(device, dtype=torch.long)
        batch_att_mask = data['attention_mask'].to(device, dtype=torch.long)
        batch_target = data['label_ids'].to(device, dtype=torch.long)

        output = model(batch_input_ids, token_type_ids=None, attention_mask=batch_att_mask, labels=batch_target)

        step_loss, eval_prediction = output['loss'], output['logits']
        eval_prediction = np.argmax(eval_prediction.detach().to('cpu').numpy(), axis=2)

        predictions.append(eval_prediction)
        reals.append(batch_target)

        eval_loss += step_loss

print(eval_loss)

This results in loss: 1.2 - 0.9 (randomly after loading)