I was able to reproduce the warning here too. Since Transformers upgraded from v4 to v5, the UNEXPECTED display has become more visually noticeable, so we might be surprised. 
Similar to the warning in v4, thereâs no real harm in most use cases. Itâs mainly a warning for clarification.
What that warning means in your case
AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT") uses the modelâs config to choose the base architecture class automatically. For a BERT-family checkpoint, AutoModel resolves to BertModel. Hugging Faceâs Auto Classes docs show exactly this pattern: AutoModel.from_pretrained("...bert...") creates a BertModel. (Hugging Face)
Your checkpoint, however, appears to contain not only the base BERT encoder weights, but also pretraining-head weights. The model page is tagged Fill-Mask, and the model card says this model was initialized from BioBERT and pretrained on MIMIC notes, while also showing AutoModel.from_pretrained(...) as a valid usage example. That combination is consistent with a checkpoint that can be used as a plain encoder but still carries extra task-specific weights from pretraining. (Hugging Face)
So the warning is saying:
- the base encoder loaded, and
- some extra checkpoint weights were present but not needed by
BertModel. (Hugging Face)
What âwhen loading from different task/architectureâ means
Here, âdifferent task/architectureâ does not mean âcompletely different neural network family.â It usually means:
- same backbone family: BERT
- but a different model class for a different task
For BERT, Hugging Face distinguishes classes such as:
BertModel: backbone only
BertForMaskedLM: backbone + masked-language-model head
BertForPreTraining: backbone + masked-language-model head + next-sentence-prediction head. (Hugging Face)
Hugging Faceâs long-standing explanation for this warning is explicit: it is expected if you initialize one class from a checkpoint trained for another task or architecture, and is not expected only when you believe the source and target classes should be exactly identical. (GitHub)
That is the phrase you were asking about.
Why those exact cls.* keys show up
The names in your warning point to BERTâs pretraining heads:
cls.predictions.* corresponds to the masked language modeling head
cls.seq_relationship.* corresponds to the next sentence prediction head. (Hugging Face)
The BERT docs describe BertForPreTraining as a BERT model with two heads on top: a masked language modeling head and a next sentence prediction head. They also note that BERTâs pooler_output is trained from the next sentence prediction objective during pretraining. (Hugging Face)
So in plain English:
- the checkpoint contains weights for BERTâs original pretraining tasks,
- but
AutoModel asked for only the backbone encoder,
- therefore those head weights are reported as unexpected and ignored. (Hugging Face)
Do you need to change your system or environment?
No. Nothing in this warning suggests a problem with:
- Python 3.13.6
- your venv
- macOS
- Apple Silicon / M1. (Hugging Face)
This is a model-class / checkpoint-content issue, not a platform issue. The warning itself is the same kind of warning Hugging Face documents for âloading from another task,â and your keys match that pattern closely. (GitHub)
So the fix is not âreinstall Pythonâ or âchange your machine.â
What you should change, if anything
That depends on what you actually want from the model.
If you want embeddings / hidden states / encoder outputs
Keep using:
model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT", token=HF_TOKEN)
That is appropriate for using the model as a BERT encoder, and the model card itself shows AutoModel.from_pretrained(...) as a valid way to use this checkpoint. In that case, the warning is usually benign. (Hugging Face)
If you want masked-token prediction
Use:
from transformers import AutoModelForMaskedLM
model = AutoModelForMaskedLM.from_pretrained("emilyalsentzer/Bio_ClinicalBERT", token=HF_TOKEN)
BertForMaskedLM is the BERT class with a language-modeling head on top, which matches fill-mask usage better than plain BertModel. (Hugging Face)
If you want the closest match to the full original BERT pretraining checkpoint
Use:
from transformers import AutoModelForPreTraining
model = AutoModelForPreTraining.from_pretrained("emilyalsentzer/Bio_ClinicalBERT", token=HF_TOKEN)
BertForPreTraining is documented as having both the MLM head and the NSP head, which is the closest conceptual match to the cls.predictions.* and cls.seq_relationship.* weights in your warning. (Hugging Face)
The practical answer for your case
Because you said:
- the script runs, and
- it produces sensible output,
the most likely explanation is that the base encoder loaded correctly, and only the extra pretraining heads were skipped. That is exactly the sort of case Hugging Face describes as expected when loading from another task/class combination. (GitHub)
So the most direct answer is:
- No environment changes are needed
- Your warning is probably harmless for encoder use
- Only change the model class if your actual task is MLM or full pretraining-head behavior. (Hugging Face)
One useful diagnostic check
Hugging Face documents output_loading_info=True, which returns missing keys, unexpected keys, and error messages from from_pretrained(). That can help confirm that the only unexpected keys are the cls.* ones you already saw. (Hugging Face)
model, info = AutoModel.from_pretrained(
"emilyalsentzer/Bio_ClinicalBERT",
token=HF_TOKEN,
output_loading_info=True,
)
print(info["unexpected_keys"])
print(info["missing_keys"])
print(info["error_msgs"])
If the unexpected keys are just the cls.* pretraining-head keys, that strongly supports the benign interpretation above. (Hugging Face)
Final conclusion
Your warning does not indicate a broken environment.
It means:
AutoModel gave you BertModel
- the checkpoint also includes pretraining-head weights
- those extra weights are reported as
UNEXPECTED
- and this is normal when the checkpoint and instantiated class are for different BERT task variants. (Hugging Face)
For a graph-database pipeline that uses ClinicalBERT as an encoder, I would usually leave the environment alone and keep AutoModel unless you specifically need masked-token logits or full pretraining-head behavior. (Hugging Face)