XLNetForSqeuenceClassification warnings

Hi,

In Google Colab notebook, I install (!pip transformers) and import XLNetForSequenceClassification model. When I instantiate the model the firs time (before training), I get the below:

Some weights of the model checkpoint at xlnet-base-cased were not used when initializing XLNetForSequenceClassification: [‘lm_loss.weight’, ‘lm_loss.bias’]

  • This IS expected if you are initializing XLNetForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
  • This IS NOT expected if you are initializing XLNetForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Some weights of XLNetForSequenceClassification were not initialized from the model checkpoint at xlnet-base-cased and are newly initialized: [‘sequence_summary.summary.weight’, ‘sequence_summary.summary.bias’, ‘logits_proj.weight’, ‘logits_proj.bias’]
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

After training, I save the model’s state_dict using torch.save(). When I load this model for inference (using torch.load), I get the same messages as above.

Why would I get the messages post training?

1 Like

I got the exact same problem for XLNet’s QuestionsAnsweringModel when I use the pretrained xlnet-base-cased model. It was working well probably 2 months ago and I didn’t change anything.

When I ignore this warning, the training loss will be too huge in each epoch.

I want to know is there a solution or reason now?

@Karthik12 @xin Please post the code that you use to: 1. load the model before training 2. the code that you use to save the model after training 3. the code that you use to load the saved model.

  1. load the model before training:

    from simpletransformers.question_answering import QuestionAnsweringModel
    import torch

    train_args = {
    ‘learning_rate’: 3e-5,
    ‘num_train_epochs’: 3, ###
    ‘max_seq_length’: 384,
    ‘doc_stride’: 384,
    ‘max_query_length’: 64,
    ‘max_answer_length’:100,
    ‘n_best_size’:3,
    ‘early_stopping_consider_epochs’: True,
    ‘overwrite_output_dir’: False, #####
    ‘reprocess_input_data’: False,
    ‘gradient_accumulation_steps’: 8,
    ‘use_early_stopping’: True,
    ‘evaluate_during_traing’: True,
    ‘save_eval_checkpoints’ : True,
    ‘save_model_every_epoch’: True,
    ‘save_steps’: 2000,
    ‘n_gpu’: 2, ###
    ‘train_batch_size’: 4,
    ‘dataloader_num_worker’: 8, ###
    ‘use_early_stopping’: True,
    ‘early_stopping_delta’: 0.01,
    ‘early_stopping_metric’: ‘eval_loss’,
    ‘early_stopping_metric_minimize’: True,
    ‘early_stopping_patience’: 3,
    ‘evaluate_during_training_steps’: 1000,
    ‘mem_len’: 1024, ### Xlnet

    }

    cuda_available = torch.cuda.is_available()

    model = QuestionAnsweringModel(‘xlnet’, ‘xlnet-base-cased’, args=train_args, use_cuda=cuda_available)

Then, I got the following warning:

Some weights of the model checkpoint at xlnet-base-cased were not used when initializing XLNetForQuestionAnswering: [‘lm_loss.weight’, ‘lm_loss.bias’]

  • This IS expected if you are initializing XLNetForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
  • This IS NOT expected if you are initializing XLNetForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Some weights of XLNetForQuestionAnswering were not initialized from the model checkpoint at xlnet-base-cased and are newly initialized: [‘start_logits.dense.weight’, ‘start_logits.dense.bias’, ‘end_logits.dense_0.weight’, ‘end_logits.dense_0.bias’, ‘end_logits.LayerNorm.weight’, ‘end_logits.LayerNorm.bias’, ‘end_logits.dense_1.weight’, ‘end_logits.dense_1.bias’, ‘answer_class.dense_0.weight’, ‘answer_class.dense_0.bias’, ‘answer_class.dense_1.weight’]
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  1. the code that you use to save the model after training:

model.train_model(train_data, output_dir=“/media/bizon/DATA/QA/model_saved”)

  1. the code that you use to load the saved model:
    It is from my previous working code, due to the current XLNet doesn’t work.

model = QuestionAnsweringModel(‘xlnet’, ‘/content/drive/MyDrive/transformers/apex/outputs/checkpoint-24435-epoch-3’)

=================================
The environment I am using:

Ubuntu 18.04.3 LTS

The libraries I used are:

Name: transformers
Version: 3.1.0

Name: simpletransformers
Version: 0.48.3

Name: torch
Version: 1.5.0

Sorry, I am not familiar with simpletransformers so I cannot help you further. I do not have the time to dig into how that library works.

Thank you anyway. I guess it is not because of simpletransformers.

Hope @Karthik12 could give code if he didn’t use simpletransformers but still had the same warning.

I retried with BertForSequenceClassification and I got the somewhat similar message I posted initially:

#BertClass

class BertClassification(torch.nn.Module):
def init(self, num_labels=1):
super(BertClassification, self).init()
self.num_labels = num_labels
self.bert = BertForSequenceClassification.from_pretrained(‘bert-base-uncased’, num_labels = self.num_labels, output_attentions = False,output_hidden_states = False)

def forward(self, input_ids, token_type_ids=None, attention_mask=None, labels=None):

if labels is None:
  logits = self.bert(input_ids=input_ids, attention_mask=attention_mask,token_type_ids=token_type_ids, labels = None)
  return logits
else:
  loss, logits = self.bert(input_ids=input_ids, attention_mask=attention_mask,token_type_ids=token_type_ids, labels = labels)
  return loss, logits

#Create Data to process for Bert
train_sentences = X_train[‘text’].values
train_sentences = [sentence for sentence in train_sentences]
train_labels = Y_train[‘label’].values

#Tokenize the texts
tokenizer = BertTokenizer.from_pretrained(‘bert-base-uncased’, do_lower_case=True)

for sent in train_sentences:
encoded_sent = tokenizer.encode_plus(
text=sent,
add_special_tokens=True,
max_length=max_len,
padding=‘max_length’,
return_attention_mask=True,
truncation=True
)
#Add the outputs to the lists
input_ids.append(encoded_sent.get(‘input_ids’))
attention_masks.append(encoded_sent.get(‘attention_mask’))

#Convert lists to tensors
train_inputs = torch.tensor(input_ids)
train_masks = torch.tensor(attention_masks)
train_labels = torch.tensor(train_labels, dtype=torch.long, device=device)

#Create Iterators for Train and Valid
train_data = TensorDataset(train_inputs, train_masks, train_labels)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

model = BertClassification(num_labels=2)
optimizer = AdamW(model.parameters(), lr = 1e-5, eps = 1e-8)
total_steps = len(train_dataloader) * num_epochs
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps = 0, num_training_steps = total_steps)

#Train Model
model = train(model=model, num_epochs=num_epochs,optimizer=optimizer,scheduler=scheduler, train_dataloader=train_dataloader, valid_dataloader=validation_dataloader)

#Save Model
model_save = model.module if hasattr(model, ‘module’) else model
checkpoint = {‘epochs’: epochs, ‘state_dict’: model_save.state_dict() }
torch.save(checkpoint, save_path)


#Load Model for inference:
model = load_model(model_save_path)

Colab Mesasge:

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: [‘cls.predictions.bias’, ‘cls.predictions.transform.dense.weight’, ‘cls.predictions.transform.dense.bias’, ‘cls.predictions.decoder.weight’, ‘cls.seq_relationship.weight’, ‘cls.seq_relationship.bias’, ‘cls.predictions.transform.LayerNorm.weight’, ‘cls.predictions.transform.LayerNorm.bias’]

  • This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
  • This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: [‘classifier.weight’, ‘classifier.bias’]
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Can you post the code for load_model? You should probably use

model.save_pretrained(save_dir)
model = BertClassification.from_pretrained(save_dir)

where BertClassification subclasses transformers’ PreTrainedModel.

load_model function does this:

checkpoint = torch.load(save_path)
model_state_dict = checkpoint[‘state_dict’]
model = BertClassification(num_labels=num_labels)
model.load_state_dict(model_state_dict)

Okay, so after some digging:

  • This is expected behqviour
  • When you run load_model you re-load the original pretrained bert-base-uncased because you reinitialize the model. This line is run again: BertForSequenceClassification.from_pretrained(‘bert-base-uncased’
  • The message that you get is because “bert-base-uncased” contains the weights for BertForPretraining, which has the following submodules:

The warning that you get tells you that the original model has weights (heads) that BertForSequenceClassification does not have. The final classifier layer in BertForSequenceClassification is randomly initialized and does not get the weights from the pretrained modelĂšs seq_relationship. As said, this warning will return every time you use BertForSequenceClassification.from_pretrained("bert-base-uncased").

The correct way, and most pythonic, is to subclass the PreTrainedModel and use the save_pretrained and load_pretrained methods directly.

1 Like

Thank you @BramVanroy. Few things:

  1. Subclass the PreTrainedModel - how can I do this as I assume it is not the same as below?
    BertSequenceClassification = BertForSequenceClassification.from_pretrained(‘bert-base-uncased’)

  2. Use the save_pretrained: I will try the below:
    model.save_pretrained(str)

  3. Load_pretrained methods directly: I will try the below:
    model.from_pretrained(str)

Ok, I think I got what you said.

I will subclass the class "PreTrainedModel " and override its methods (save_pretrained and load_pretrained) to change the behavior in these methods. Let me try.

No, you do not need to override those methods. They should work without needing to change anything. However, now that I take a better look at your code, why do you need a separate model? Why don’t you just use the BertForSequenceClassification model itself? You do not add any layers, right? So I think you can use this

num_labels = 1
config = AutoConfig.from_pretrained("bert-base-uncased",
                                    num_labels=num_labels,
                                    output_attentions=False,
                                    output_hidden_states=False)
bert = BertForSequenceClassification.from_pretrained("bert-base-uncased", config=config)

# train model here...

# Saving/loading using built-in functionality
bert.save_pretrained(save_dir)
# Load the correct weights directly
bert = BertForSequenceClassification.from_pretrained(save_dir,
                                                     num_labels=num_labels,
                                                     output_attentions=False,
                                                     output_hidden_states=False)

# ...or using your own save/load method
checkpoint = {"epochs": epochs, "state_dict": model_save.state_dict()}
torch.save(checkpoint, save_path)

checkpoint = torch.load(save_path)
# NO from_pretrained so we don't unnecessarily load weights twice"
bert = BertForSequenceClassification(config)
bert.load_state_dict(checkpoint["state_dict"])

The fact I instantiate the model (again) in load_model function gives me the messages as you said. True,as I am not adding extra layers, I could do the above.

Thank you, this helps to understand the approach.

1 Like

@BramVanroy is this expected? It confused me as well. I trained a model by adapting run_mlm.py and then fine-tuned using another script I adapted from run_glue.py (these scripts are generic, they use Auto*)

If I load the fine-tuned model using AutoModel.from_pretrained() it returns the model as a RobertaModel which produces the warning, Using AutoModelForSequenceClassification.from_pretrained() returns the same model as a RobertaModelForSequenceClassification

Since my model config contains

“architectures”: [
“RobertaForSequenceClassification”
],

I originally expected that AutoModel would return this class but I guess not.

Wow. This thread is really complex.

Here’s what I do to serialize my model after training:

torch.save(model.state_dict(), save_model_filename)

Here’s what I do to deserialize my model:

model = AutoModel.from_pretrained(model_name)
model.load_state_dict(torch.load(save_model_filename))

I’ve use the above code successfully. Let me know if I’ve done anything wrong.

@Karthik12 …Is this issue solved by solution provided by @BramVanroy ? I am also getting same warning and model is not behaving correctly.

Same question has been asked in below: