Warning when using ESM pre-trained model

pipparichter · May 3, 2023, 2:39am

Hello!

I am just getting started with HuggingFace and using pre-trained models. I am trying to generate a series of embeddings for amino acid sequences. I am following a tutorial from this page. However, when I load the model, I am getting the following warning:

Some weights of the model checkpoint at facebook/esm2_t6_8M_UR50D were not used when initializing EsmModel: ['lm_head.dense.bias', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing EsmModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing EsmModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Is this behavior to be expected for my particular use case? (Which is given below). My understanding is that some of the weights are being randomly initialized, and the pre-trained weights are not being used… I just am not sure if this is expected behavior here.

# Not totally sure what each part of this model name means
model_name = 'facebook/esm2_t6_8M_UR50D'

# Tokenizer converts a string into a form the model (ESM) can handle.
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
# Load the model with the pretrained model weights (as opposed to just the architecture)
model = transformers.EsmModel.from_pretrained(model_name) #, output_hidden_states=True)

oliverfleetwood · December 21, 2023, 12:25pm

Did you find a solution to your issue?

nielsr · December 26, 2023, 6:34pm

Hi,

This warning is telling you that the checkpoint on the hub also includes masked language modeling weights, which are not loaded when you initialize it with an EsmModel. The weights would only be loaded when you load an EsmForMaskedLM model, which is EsmModel with a language modeling head on top.

If you intend to use the model for feature extraction only, then it’s recommended to just use EsmModel. Only if you want to do masked language modeling (which is the pre-training objective), you need to load EsmForMaskedLM.

Topic		Replies	Views
Model weights warning while loading any model from HuggingFace models 🤗Transformers	2	855	September 21, 2021
Is "Some weights of the model were not used" warning normal when pre-trained BERT only by MLM Beginners	6	18395	March 28, 2024
Why aren't all weights of BertForPreTraining initialized from the model checkpoint? Beginners	3	1588	October 5, 2021
"Some weights were not used" message with AutoModel Beginners	4	1935	May 21, 2024
Uninitiallized weights with supposed correct architecture Models	1	330	October 6, 2023

Warning when using ESM pre-trained model

Related topics