Message "Some layers from the model were not used"

Hi,

I have a local Python 3.8 conda environment with tensorflow and transformers installed with pip (because conda does not install transformers with Python 3.8)
But I keep getting warning messages like “Some layers from the model checkpoint at (model-name) were not used when initializing (…)”

Even running the first simple example from the quick tour page generates 2 of these warning (although slightly different) as shown below

Code:

from transformers import pipeline
classifier = pipeline('sentiment-analysis')

Output:

Downloading: 100% 629/629 [00:11<00:00, 52.5B/s]
Downloading: 100% 268M/268M [00:11<00:00, 23.9MB/s]

Some layers from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing TFDistilBertModel: ['pre_classifier', 'classifier', 'dropout_19']
- This IS expected if you are initializing TFDistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFDistilBertModel were initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertModel for predictions without further training.

Downloading: 100% 232k/232k [00:02<00:00, 111kB/s]
Downloading: 100% 230/230 [00:01<00:00, 226B/s]

Some layers from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing TFDistilBertForSequenceClassification: ['dropout_19']
- This IS expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of TFDistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english and are newly initialized: ['dropout_38']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

My configuration (‘transformers-cli env’ output):

2020-11-10 21:32:33.799767: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2020-11-10 21:32:33.804571: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
WARNING:tensorflow:From c:\users\basvdw\miniconda3\envs\lm38\lib\site-packages\transformers\commands\env.py:36: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2020-11-10 21:32:37.029143: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-11-10 21:32:37.049021: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x154dca447d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-10 21:32:37.055558: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-11-10 21:32:37.061622: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found
2020-11-10 21:32:37.065536: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2020-11-10 21:32:37.074543: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: S825
2020-11-10 21:32:37.080321: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: S825

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

- `transformers` version: 3.5.0
- Platform: Windows-10-10.0.18362-SP0
- Python version: 3.8.6
- PyTorch version (GPU?): not installed (NA)
- Tensorflow version (GPU?): 2.3.1 (False)
- Using GPU in script?: No:
- Using distributed or parallel set-up in script?: No

Does anyone know what causes this messages and how I could fix this ? I do not really understand the warning, because I thought I was using a pre-trained model which doesn’t need any more training…

Any help would be appreciated !

Tagging @jplu so he’s aware, this looks like a bug (but you can proceed safely, the bug is that there is a warning when there should be nothing).

This is not a bug, this is a normal behavior. It is just a warning saying that your are not using all the saved weights from the checkpoint and they are ignored.

Yes but this is a model fine-tuned for the task, so it should not appear.

The logs say that a TFDistilBertModel is being initialized with a TFDistilBertForSequenceClassification checkpoint so it is normal that the dropout and classification layers are ignored, thus the logs are raising a proper message. The bug should be in the pipeline itself.

I don’t see where you see that. The pipeline is instantiating a TFDistilBertForSequenceClassification for sentiment-analysis, not a TFDistilBertModel.

Hi @bas!

Actually, while there is a bit of an issue here, there is nothing for you to worry about.

If you want the full explanation, you’re basically getting two messages here:

Some layers from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing TFDistilBertModel: ['pre_classifier', 'classifier', 'dropout_19']

and

Some layers from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing TFDistilBertForSequenceClassification: ['dropout_19']

You shouldn’t be getting the first one, as it tells you that it is loading the checkpoint in the TFDistilBertModel and that there are some missing weights. The first one happens because we’re doing a sanity check to validate that the checkpoint can be safely loaded in that framework (here that’s TensorFlow). We should have a better way of doing this, as we’re not interested in the TFDistilBertModel but the TFDistilBertForSequenceClassification. We shouldn’t need to load the weights in the first model, and we definitely shouldn’t output missing weights/unused weights for that step.

Regarding the second one, it tells you that the TFDistilBertForSequenceClassification does not use dropout_19 in its architecture, and, therefore, doesn’t load it. However, it newly initializes dropout_38. You shouldn’t worry about this either, because dropouts have no weights. We should probably do a check and only add to unused weights/newly initialized weights when the layers are not dropouts.

4 Likes

Hi @lysandre,

Thanks a lot for your explanation (also thanks to sgugger and jplu ) :+1:! Although your answers/discussions go a little too deep for me, since I’m just starting to use the Transformers library, I’m glad to hear that the messages don’t hurt.

It is strange though that I get these messages on my local machine on every configuration I’ve tried (Python 3.7, 3.8, conda env’s, python venv environments, different versions of Tensorflow and Transformers), but on other environments they never seem to appear (i.e. in Google Colab).
But I will ignore them.

3 Likes