Warning when using ESM pre-trained model


I am just getting started with HuggingFace and using pre-trained models. I am trying to generate a series of embeddings for amino acid sequences. I am following a tutorial from this page. However, when I load the model, I am getting the following warning:

Some weights of the model checkpoint at facebook/esm2_t6_8M_UR50D were not used when initializing EsmModel: ['lm_head.dense.bias', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing EsmModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing EsmModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Is this behavior to be expected for my particular use case? (Which is given below). My understanding is that some of the weights are being randomly initialized, and the pre-trained weights are not being used… I just am not sure if this is expected behavior here.

# Not totally sure what each part of this model name means
model_name = 'facebook/esm2_t6_8M_UR50D'

# Tokenizer converts a string into a form the model (ESM) can handle.
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
# Load the model with the pretrained model weights (as opposed to just the architecture)
model = transformers.EsmModel.from_pretrained(model_name) #, output_hidden_states=True)
1 Like

Did you find a solution to your issue?


This warning is telling you that the checkpoint on the hub also includes masked language modeling weights, which are not loaded when you initialize it with an EsmModel. The weights would only be loaded when you load an EsmForMaskedLM model, which is EsmModel with a language modeling head on top.

If you intend to use the model for feature extraction only, then it’s recommended to just use EsmModel. Only if you want to do masked language modeling (which is the pre-training objective), you need to load EsmForMaskedLM.

1 Like