Could not find MistralForCausalLM in transformers

Hi. I finetuned mistralai/Mistral-Small-24B-Base-2501 on a dataset and now I’m trying to run inference for it. I’m using AutoModelForCausalLM.from_pretrained to load it but getting this error: Could not find MistralForCausalLM neither in transformers. I’m running the latest version of transformers 4.56.0. What might be the reason? Installing transformers from source according to this post support for MistralForCausalLM · Issue #26458 · huggingface/transformers · GitHub didn’t fix it.

1 Like

Hmm, maybe it’s missing dependencies or something…?
I don’t think the class itself is actually missing…

pip install -U mistral_common sentencepiece
import transformers, sys
print("transformers", transformers.__version__)
try:
    from transformers.models.mistral.modeling_mistral import MistralForCausalLM
    print("MistralForCausalLM OK")
except Exception as e:
    print("MistralForCausalLM FAIL:", e, file=sys.stderr)

@John6666 getting this when I run that code snippet
``
MistralForCausalLM FAIL: partially initialized module ‘torchvision’ has no attribute ‘extension’ (most likely due to a circular import)
```

1 Like

Judging just by the error, it’s probably a version mismatch between torch and torchvision.

pip install torchvision==x.xx.x

Domain Version Compatibility Matrix for PyTorch

1 Like

@John6666 thanks! yes, aligning the versions helped :slight_smile:

I have fine-tuned the model and now running into this run-time error while loading it:
RuntimeError: Error(s) in loading state_dict for Embedding:
size mismatch for weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([131072, 5120]). Any idea what might be causing this?

1 Like

Based on the error message, I’d guess it’s either trying to load the PEFT adapter as a whole model weight or the model weights are corrupted…

@John6666 could this be because of deepspeed? when I do len(tokenizer) it prints 131072.

1 Like

could this be because of deepspeed

I think very likely…
When saving fails in DeepSpeed, it appears an empty tensor is saved instead.

@John6666 I’m using "stage3_gather_16bit_weights_on_model_save": true as suggested here. Not sure what else is causing this.

1 Like

This may also occur when using BF16 or when using older version of PEFT.

pip install -U peft

@John6666 using model.save_16bit_model() to save the model insread of save_pretrained() fixed this!

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.