XLNetForSqeuenceClassification warnings

Karthik12 · September 12, 2020, 11:43am

Hi,

In Google Colab notebook, I install (!pip transformers) and import XLNetForSequenceClassification model. When I instantiate the model the firs time (before training), I get the below:

Some weights of the model checkpoint at xlnet-base-cased were not used when initializing XLNetForSequenceClassification: [‘lm_loss.weight’, ‘lm_loss.bias’]

This IS expected if you are initializing XLNetForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
This IS NOT expected if you are initializing XLNetForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of XLNetForSequenceClassification were not initialized from the model checkpoint at xlnet-base-cased and are newly initialized: [‘sequence_summary.summary.weight’, ‘sequence_summary.summary.bias’, ‘logits_proj.weight’, ‘logits_proj.bias’]
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

After training, I save the model’s state_dict using torch.save(). When I load this model for inference (using torch.load), I get the same messages as above.

Why would I get the messages post training?

xin · September 23, 2020, 8:09am

I got the exact same problem for XLNet’s QuestionsAnsweringModel when I use the pretrained xlnet-base-cased model. It was working well probably 2 months ago and I didn’t change anything.

When I ignore this warning, the training loss will be too huge in each epoch.

I want to know is there a solution or reason now?

BramVanroy · September 23, 2020, 8:20am

@Karthik12 @xin Please post the code that you use to: 1. load the model before training 2. the code that you use to save the model after training 3. the code that you use to load the saved model.

xin · September 23, 2020, 8:47am

load the model before training:

from simpletransformers.question_answering import QuestionAnsweringModel
import torch

train_args = {
‘learning_rate’: 3e-5,
‘num_train_epochs’: 3, ###
‘max_seq_length’: 384,
‘doc_stride’: 384,
‘max_query_length’: 64,
‘max_answer_length’:100,
‘n_best_size’:3,
‘early_stopping_consider_epochs’: True,
‘overwrite_output_dir’: False, #####
‘reprocess_input_data’: False,
‘gradient_accumulation_steps’: 8,
‘use_early_stopping’: True,
‘evaluate_during_traing’: True,
‘save_eval_checkpoints’ : True,
‘save_model_every_epoch’: True,
‘save_steps’: 2000,
‘n_gpu’: 2, ###
‘train_batch_size’: 4,
‘dataloader_num_worker’: 8, ###
‘use_early_stopping’: True,
‘early_stopping_delta’: 0.01,
‘early_stopping_metric’: ‘eval_loss’,
‘early_stopping_metric_minimize’: True,
‘early_stopping_patience’: 3,
‘evaluate_during_training_steps’: 1000,
‘mem_len’: 1024, ### Xlnet

}

cuda_available = torch.cuda.is_available()

model = QuestionAnsweringModel(‘xlnet’, ‘xlnet-base-cased’, args=train_args, use_cuda=cuda_available)

Then, I got the following warning:

Some weights of the model checkpoint at xlnet-base-cased were not used when initializing XLNetForQuestionAnswering: [‘lm_loss.weight’, ‘lm_loss.bias’]

This IS expected if you are initializing XLNetForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).

This IS NOT expected if you are initializing XLNetForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of XLNetForQuestionAnswering were not initialized from the model checkpoint at xlnet-base-cased and are newly initialized: [‘start_logits.dense.weight’, ‘start_logits.dense.bias’, ‘end_logits.dense_0.weight’, ‘end_logits.dense_0.bias’, ‘end_logits.LayerNorm.weight’, ‘end_logits.LayerNorm.bias’, ‘end_logits.dense_1.weight’, ‘end_logits.dense_1.bias’, ‘answer_class.dense_0.weight’, ‘answer_class.dense_0.bias’, ‘answer_class.dense_1.weight’]
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

the code that you use to save the model after training:

model.train_model(train_data, output_dir=“/media/bizon/DATA/QA/model_saved”)

the code that you use to load the saved model:
It is from my previous working code, due to the current XLNet doesn’t work.

model = QuestionAnsweringModel(‘xlnet’, ‘/content/drive/MyDrive/transformers/apex/outputs/checkpoint-24435-epoch-3’)

=================================
The environment I am using:

Ubuntu 18.04.3 LTS

The libraries I used are:

Name: transformers
Version: 3.1.0

Name: simpletransformers
Version: 0.48.3

Name: torch
Version: 1.5.0

BramVanroy · September 23, 2020, 9:35am

Sorry, I am not familiar with simpletransformers so I cannot help you further. I do not have the time to dig into how that library works.

xin · September 23, 2020, 9:54am

Thank you anyway. I guess it is not because of simpletransformers.

Hope @Karthik12 could give code if he didn’t use simpletransformers but still had the same warning.

Karthik12 · September 23, 2020, 11:16am

I retried with BertForSequenceClassification and I got the somewhat similar message I posted initially:

#BertClass

class BertClassification(torch.nn.Module):
def init(self, num_labels=1):
super(BertClassification, self).init()
self.num_labels = num_labels
self.bert = BertForSequenceClassification.from_pretrained(‘bert-base-uncased’, num_labels = self.num_labels, output_attentions = False,output_hidden_states = False)

def forward(self, input_ids, token_type_ids=None, attention_mask=None, labels=None):

if labels is None:
  logits = self.bert(input_ids=input_ids, attention_mask=attention_mask,token_type_ids=token_type_ids, labels = None)
  return logits
else:
  loss, logits = self.bert(input_ids=input_ids, attention_mask=attention_mask,token_type_ids=token_type_ids, labels = labels)
  return loss, logits

#Create Data to process for Bert
train_sentences = X_train[‘text’].values
train_sentences = [sentence for sentence in train_sentences]
train_labels = Y_train[‘label’].values

#Tokenize the texts
tokenizer = BertTokenizer.from_pretrained(‘bert-base-uncased’, do_lower_case=True)

for sent in train_sentences:
encoded_sent = tokenizer.encode_plus(
text=sent,
add_special_tokens=True,
max_length=max_len,
padding=‘max_length’,
return_attention_mask=True,
truncation=True
)
#Add the outputs to the lists
input_ids.append(encoded_sent.get(‘input_ids’))
attention_masks.append(encoded_sent.get(‘attention_mask’))

#Convert lists to tensors
train_inputs = torch.tensor(input_ids)
train_masks = torch.tensor(attention_masks)
train_labels = torch.tensor(train_labels, dtype=torch.long, device=device)

#Create Iterators for Train and Valid
train_data = TensorDataset(train_inputs, train_masks, train_labels)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

model = BertClassification(num_labels=2)
optimizer = AdamW(model.parameters(), lr = 1e-5, eps = 1e-8)
total_steps = len(train_dataloader) * num_epochs
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps = 0, num_training_steps = total_steps)

#Train Model
model = train(model=model, num_epochs=num_epochs,optimizer=optimizer,scheduler=scheduler, train_dataloader=train_dataloader, valid_dataloader=validation_dataloader)

#Save Model
model_save = model.module if hasattr(model, ‘module’) else model
checkpoint = {‘epochs’: epochs, ‘state_dict’: model_save.state_dict() }
torch.save(checkpoint, save_path)

#Load Model for inference:
model = load_model(model_save_path)

Colab Mesasge:

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: [‘cls.predictions.bias’, ‘cls.predictions.transform.dense.weight’, ‘cls.predictions.transform.dense.bias’, ‘cls.predictions.decoder.weight’, ‘cls.seq_relationship.weight’, ‘cls.seq_relationship.bias’, ‘cls.predictions.transform.LayerNorm.weight’, ‘cls.predictions.transform.LayerNorm.bias’]

This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: [‘classifier.weight’, ‘classifier.bias’]
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

BramVanroy · September 23, 2020, 11:51am

Can you post the code for load_model? You should probably use

model.save_pretrained(save_dir)
model = BertClassification.from_pretrained(save_dir)

where BertClassification subclasses transformers’ PreTrainedModel.

Karthik12 · September 23, 2020, 11:55am

load_model function does this:

checkpoint = torch.load(save_path)
model_state_dict = checkpoint[‘state_dict’]
model = BertClassification(num_labels=num_labels)
model.load_state_dict(model_state_dict)

BramVanroy · September 23, 2020, 12:34pm

Okay, so after some digging:

This is expected behqviour
When you run load_model you re-load the original pretrained bert-base-uncased because you reinitialize the model. This line is run again: BertForSequenceClassification.from_pretrained(‘bert-base-uncased’
The message that you get is because “bert-base-uncased” contains the weights for BertForPretraining, which has the following submodules:

github.com

huggingface/transformers/blob/28cf873036d078b47fb9dd38ac3421a7c874da44/src/transformers/modeling_bert.py#L858-L865


class BertForPreTraining(BertPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)

        self.bert = BertModel(config)
        self.cls = BertPreTrainingHeads(config)

        self.init_weights()

The warning that you get tells you that the original model has weights (heads) that BertForSequenceClassification does not have. The final classifier layer in BertForSequenceClassification is randomly initialized and does not get the weights from the pretrained modelùs seq_relationship. As said, this warning will return every time you use BertForSequenceClassification.from_pretrained("bert-base-uncased").

The correct way, and most pythonic, is to subclass the PreTrainedModel and use the save_pretrained and load_pretrained methods directly.

Karthik12 · September 23, 2020, 12:42pm

Thank you @BramVanroy. Few things:

Subclass the PreTrainedModel - how can I do this as I assume it is not the same as below?
BertSequenceClassification = BertForSequenceClassification.from_pretrained(‘bert-base-uncased’)
Use the save_pretrained: I will try the below:
model.save_pretrained(str)
Load_pretrained methods directly: I will try the below:
model.from_pretrained(str)

Karthik12 · September 23, 2020, 1:05pm

Ok, I think I got what you said.

I will subclass the class "PreTrainedModel " and override its methods (save_pretrained and load_pretrained) to change the behavior in these methods. Let me try.

BramVanroy · September 23, 2020, 1:22pm

No, you do not need to override those methods. They should work without needing to change anything. However, now that I take a better look at your code, why do you need a separate model? Why don’t you just use the BertForSequenceClassification model itself? You do not add any layers, right? So I think you can use this

num_labels = 1
config = AutoConfig.from_pretrained("bert-base-uncased",
                                    num_labels=num_labels,
                                    output_attentions=False,
                                    output_hidden_states=False)
bert = BertForSequenceClassification.from_pretrained("bert-base-uncased", config=config)

# train model here...

# Saving/loading using built-in functionality
bert.save_pretrained(save_dir)
# Load the correct weights directly
bert = BertForSequenceClassification.from_pretrained(save_dir,
                                                     num_labels=num_labels,
                                                     output_attentions=False,
                                                     output_hidden_states=False)

# ...or using your own save/load method
checkpoint = {"epochs": epochs, "state_dict": model_save.state_dict()}
torch.save(checkpoint, save_path)

checkpoint = torch.load(save_path)
# NO from_pretrained so we don't unnecessarily load weights twice"
bert = BertForSequenceClassification(config)
bert.load_state_dict(checkpoint["state_dict"])

Karthik12 · September 23, 2020, 1:30pm

The fact I instantiate the model (again) in load_model function gives me the messages as you said. True,as I am not adding extra layers, I could do the above.

Thank you, this helps to understand the approach.

david-waterworth · December 23, 2020, 12:05am

@BramVanroy is this expected? It confused me as well. I trained a model by adapting run_mlm.py and then fine-tuned using another script I adapted from run_glue.py (these scripts are generic, they use Auto*)

If I load the fine-tuned model using AutoModel.from_pretrained() it returns the model as a RobertaModel which produces the warning, Using AutoModelForSequenceClassification.from_pretrained() returns the same model as a RobertaModelForSequenceClassification

Since my model config contains

“architectures”: [
“RobertaForSequenceClassification”
],

I originally expected that AutoModel would return this class but I guess not.

facehugger2020 · December 23, 2020, 12:45am

Wow. This thread is really complex.

Here’s what I do to serialize my model after training:

torch.save(model.state_dict(), save_model_filename)

Here’s what I do to deserialize my model:

model = AutoModel.from_pretrained(model_name)
model.load_state_dict(torch.load(save_model_filename))

I’ve use the above code successfully. Let me know if I’ve done anything wrong.

PremalMatalia · April 3, 2021, 3:12pm

@Karthik12 …Is this issue solved by solution provided by @BramVanroy ? I am also getting same warning and model is not behaving correctly.

Same question has been asked in below:

Topic		Replies	Views
XLNetForSequenceClassification 🤗Transformers	27	1216	January 16, 2021
Unable to load checkpoint after finetuning Intermediate	5	4644	February 21, 2024
Is "Some weights of the model were not used" warning normal when pre-trained BERT only by MLM Beginners	6	18401	March 28, 2024
Weights not downloading Beginners	3	1839	May 24, 2021
XLMForSequenceClassification classifier layer? Beginners	4	472	December 10, 2020

XLNetForSqeuenceClassification warnings

Related topics