Weights not downloading


When first I did

from transformers import BertModel
model = BertModel.from_pretrained('bert-base-cased')

Then it’s fine.

But after doing the above, when I do:

from transformers import BertForSequenceClassification
m = BertForSequenceClassification.from_pretrained('bert-base-cased')

I get warning messages:

Some weights of the model checkpoint at bert-base-cased were not used when 
initializing BertForSequenceClassification: ['cls.predictions.bias', 
'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 
'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 
'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']

- This IS expected if you are initializing BertForSequenceClassification from the 
checkpoint of a model trained on another task or with another architecture (e.g. 
initializing a BertForSequenceClassification model from a BertForPreTraining model).

Some weights of BertForSequenceClassification were not initialized from the model 
checkpoint at bert-base-cased and are newly initialized: ['classifier.weight', 

There is another topic regarding the same issue in the forum here.

What I have understood is that, due to the first code which I ran, the weights of the pre-trained bare bert-base-cased model got downloaded, and when I ran the second code for sequence classification, the weights regarding the sequence classification didn’t get downloaded because it is grabbing its checkpoint from the first code which I ran.

The same is also given in the last paragraph of the warning message.

So, what’s the solution to download the pre-trained weights for sequence classification tasks or in general other tasks?



Can anyone help me regarding this?


Hey @tejan-rgb. I’ve never faced this before. Have you tried to load another model just to make sure it works? Bc if it does, then maybe it’s a problem w/ bert-base-cased

Hi @tejan-rgb! This is the expected behavior as bert-base-cased is trained without a task-specific head, and its goal is to learn powerful general representations of language. The resulting message you are seeing is expected, as it is suggested that you “fine-tune” this task-specific head (classification in your case) to your specific data and labels. If you have data you’d like to specialize this model to, then you can ignore the warning messages and train the model. If you want an out of the box model for a sequence classification task, thankfully Hugging Face’s awesome community has already fine-tuned a ton of models - you’ll need to download the weights of the model that best suits your needs here.

Need more comprehensive support specialized to your use-cases? :hugs: has you covered! Through our Expert Acceleration Program your business can leverage our expertise to accelerate your NLP roadmap, from modeling to production.