AutoModel with ClinicalBERT gives UNEXPECTED warning

I’m using the Python ‘transformers’ package, version 5.3.0 to interact with an existing graph database. The script seems to work, giving me sensible output, but produces a warning that concerns me.

The warning is produced when this statement is executed:

model = AutoModel.from_pretrained(“emilyalsentzer/Bio_ClinicalBERT”, token=HF_TOKEN)

The text of the warning is as follows:

Loading weights: 100%|

BertModel LOAD REPORT from: emilyalsentzer/Bio_ClinicalBERT
Key | Status | |
-------------------------------------------±-----------±-±
cls.predictions.transform.dense.weight | UNEXPECTED | |
cls.predictions.transform.LayerNorm.bias | UNEXPECTED | |
cls.predictions.transform.LayerNorm.weight | UNEXPECTED | |
cls.predictions.bias | UNEXPECTED | |
cls.seq_relationship.weight | UNEXPECTED | |
cls.seq_relationship.bias | UNEXPECTED | |
cls.predictions.transform.dense.bias | UNEXPECTED | |
cls.predictions.decoder.weight | UNEXPECTED | |

Notes:

  • UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.

I don’t understand what is meant by the phrase “when loading from different task/architecture”. Can someone please explain it? I’m running Python version 3.13.6 in a venv on a Macbook M1 processor.

What changes do I need to make to my system/environment such that this ‘UNEXPECTED’ issue is resolved?

1 Like

I was able to reproduce the warning here too. Since Transformers upgraded from v4 to v5, the UNEXPECTED display has become more visually noticeable, so we might be surprised. :sweat_smile:

Similar to the warning in v4, there’s no real harm in most use cases. It’s mainly a warning for clarification.


What that warning means in your case

AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT") uses the model’s config to choose the base architecture class automatically. For a BERT-family checkpoint, AutoModel resolves to BertModel. Hugging Face’s Auto Classes docs show exactly this pattern: AutoModel.from_pretrained("...bert...") creates a BertModel. (Hugging Face)

Your checkpoint, however, appears to contain not only the base BERT encoder weights, but also pretraining-head weights. The model page is tagged Fill-Mask, and the model card says this model was initialized from BioBERT and pretrained on MIMIC notes, while also showing AutoModel.from_pretrained(...) as a valid usage example. That combination is consistent with a checkpoint that can be used as a plain encoder but still carries extra task-specific weights from pretraining. (Hugging Face)

So the warning is saying:

  • the base encoder loaded, and
  • some extra checkpoint weights were present but not needed by BertModel. (Hugging Face)

What “when loading from different task/architecture” means

Here, “different task/architecture” does not mean “completely different neural network family.” It usually means:

  • same backbone family: BERT
  • but a different model class for a different task

For BERT, Hugging Face distinguishes classes such as:

  • BertModel: backbone only
  • BertForMaskedLM: backbone + masked-language-model head
  • BertForPreTraining: backbone + masked-language-model head + next-sentence-prediction head. (Hugging Face)

Hugging Face’s long-standing explanation for this warning is explicit: it is expected if you initialize one class from a checkpoint trained for another task or architecture, and is not expected only when you believe the source and target classes should be exactly identical. (GitHub)

That is the phrase you were asking about.

Why those exact cls.* keys show up

The names in your warning point to BERT’s pretraining heads:

  • cls.predictions.* corresponds to the masked language modeling head
  • cls.seq_relationship.* corresponds to the next sentence prediction head. (Hugging Face)

The BERT docs describe BertForPreTraining as a BERT model with two heads on top: a masked language modeling head and a next sentence prediction head. They also note that BERT’s pooler_output is trained from the next sentence prediction objective during pretraining. (Hugging Face)

So in plain English:

  • the checkpoint contains weights for BERT’s original pretraining tasks,
  • but AutoModel asked for only the backbone encoder,
  • therefore those head weights are reported as unexpected and ignored. (Hugging Face)

Do you need to change your system or environment?

No. Nothing in this warning suggests a problem with:

  • Python 3.13.6
  • your venv
  • macOS
  • Apple Silicon / M1. (Hugging Face)

This is a model-class / checkpoint-content issue, not a platform issue. The warning itself is the same kind of warning Hugging Face documents for “loading from another task,” and your keys match that pattern closely. (GitHub)

So the fix is not “reinstall Python” or “change your machine.”

What you should change, if anything

That depends on what you actually want from the model.

If you want embeddings / hidden states / encoder outputs

Keep using:

model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT", token=HF_TOKEN)

That is appropriate for using the model as a BERT encoder, and the model card itself shows AutoModel.from_pretrained(...) as a valid way to use this checkpoint. In that case, the warning is usually benign. (Hugging Face)

If you want masked-token prediction

Use:

from transformers import AutoModelForMaskedLM
model = AutoModelForMaskedLM.from_pretrained("emilyalsentzer/Bio_ClinicalBERT", token=HF_TOKEN)

BertForMaskedLM is the BERT class with a language-modeling head on top, which matches fill-mask usage better than plain BertModel. (Hugging Face)

If you want the closest match to the full original BERT pretraining checkpoint

Use:

from transformers import AutoModelForPreTraining
model = AutoModelForPreTraining.from_pretrained("emilyalsentzer/Bio_ClinicalBERT", token=HF_TOKEN)

BertForPreTraining is documented as having both the MLM head and the NSP head, which is the closest conceptual match to the cls.predictions.* and cls.seq_relationship.* weights in your warning. (Hugging Face)

The practical answer for your case

Because you said:

  • the script runs, and
  • it produces sensible output,

the most likely explanation is that the base encoder loaded correctly, and only the extra pretraining heads were skipped. That is exactly the sort of case Hugging Face describes as expected when loading from another task/class combination. (GitHub)

So the most direct answer is:

  • No environment changes are needed
  • Your warning is probably harmless for encoder use
  • Only change the model class if your actual task is MLM or full pretraining-head behavior. (Hugging Face)

One useful diagnostic check

Hugging Face documents output_loading_info=True, which returns missing keys, unexpected keys, and error messages from from_pretrained(). That can help confirm that the only unexpected keys are the cls.* ones you already saw. (Hugging Face)

model, info = AutoModel.from_pretrained(
    "emilyalsentzer/Bio_ClinicalBERT",
    token=HF_TOKEN,
    output_loading_info=True,
)

print(info["unexpected_keys"])
print(info["missing_keys"])
print(info["error_msgs"])

If the unexpected keys are just the cls.* pretraining-head keys, that strongly supports the benign interpretation above. (Hugging Face)

Final conclusion

Your warning does not indicate a broken environment.

It means:

  • AutoModel gave you BertModel
  • the checkpoint also includes pretraining-head weights
  • those extra weights are reported as UNEXPECTED
  • and this is normal when the checkpoint and instantiated class are for different BERT task variants. (Hugging Face)

For a graph-database pipeline that uses ClinicalBERT as an encoder, I would usually leave the environment alone and keep AutoModel unless you specifically need masked-token logits or full pretraining-head behavior. (Hugging Face)

Thank you for this detailed answer.

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.