Could not find MistralForCausalLM in transformers

jaydeepb · September 1, 2025, 2:12am

Hi. I finetuned mistralai/Mistral-Small-24B-Base-2501 on a dataset and now I’m trying to run inference for it. I’m using AutoModelForCausalLM.from_pretrained to load it but getting this error: Could not find MistralForCausalLM neither in transformers. I’m running the latest version of transformers 4.56.0. What might be the reason? Installing transformers from source according to this post support for MistralForCausalLM · Issue #26458 · huggingface/transformers · GitHub didn’t fix it.

John6666 · September 1, 2025, 2:46am

Hmm, maybe it’s missing dependencies or something…?
I don’t think the class itself is actually missing…

pip install -U mistral_common sentencepiece

import transformers, sys
print("transformers", transformers.__version__)
try:
    from transformers.models.mistral.modeling_mistral import MistralForCausalLM
    print("MistralForCausalLM OK")
except Exception as e:
    print("MistralForCausalLM FAIL:", e, file=sys.stderr)

jaydeepb · September 1, 2025, 3:22am

@John6666 getting this when I run that code snippet
``
MistralForCausalLM FAIL: partially initialized module ‘torchvision’ has no attribute ‘extension’ (most likely due to a circular import)
```

John6666 · September 1, 2025, 3:29am

Judging just by the error, it’s probably a version mismatch between torch and torchvision.

pip install torchvision==x.xx.x

Domain Version Compatibility Matrix for PyTorch

jaydeepb · September 1, 2025, 4:02am

@John6666 thanks! yes, aligning the versions helped

I have fine-tuned the model and now running into this run-time error while loading it:
RuntimeError: Error(s) in loading state_dict for Embedding:
size mismatch for weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([131072, 5120]). Any idea what might be causing this?

John6666 · September 1, 2025, 4:14am

Based on the error message, I’d guess it’s either trying to load the PEFT adapter as a whole model weight or the model weights are corrupted…

jaydeepb · September 1, 2025, 4:22am

@John6666 could this be because of deepspeed? when I do len(tokenizer) it prints 131072.

John6666 · September 1, 2025, 4:39am

could this be because of deepspeed

I think very likely…
When saving fails in DeepSpeed, it appears an empty tensor is saved instead.

jaydeepb · September 1, 2025, 5:04am

@John6666 I’m using "stage3_gather_16bit_weights_on_model_save": true as suggested here. Not sure what else is causing this.

John6666 · September 1, 2025, 6:40am

This may also occur when using BF16 or when using older version of PEFT.

pip install -U peft

jaydeepb · September 1, 2025, 9:08am

@John6666 using model.save_16bit_model() to save the model insread of save_pretrained() fixed this!

system · September 1, 2025, 9:09pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Tried to download Mistral 7B but got an error message 🤗Transformers	3	13484	October 8, 2023
I'm having an issue with the mistral-instruct model Models	4	2438	February 8, 2024
Loading model from pytorch_pretrained_bert into transformers library 🤗Transformers	2	8060	January 5, 2022
Transformers suddenly complaining about pytorch? 🤗Transformers	2	8670	August 3, 2021
Error trying to load MarkupLMForPretraining 🤗Transformers	2	553	June 17, 2022

Could not find MistralForCausalLM in transformers

Domain Version Compatibility Matrix for PyTorch

Related topics