Hi. I finetuned mistralai/Mistral-Small-24B-Base-2501
on a dataset and now I’m trying to run inference for it. I’m using AutoModelForCausalLM.from_pretrained
to load it but getting this error: Could not find MistralForCausalLM neither in transformers
. I’m running the latest version of transformers 4.56.0. What might be the reason? Installing transformers from source according to this post support for MistralForCausalLM · Issue #26458 · huggingface/transformers · GitHub didn’t fix it.
Hmm, maybe it’s missing dependencies or something…?
I don’t think the class itself is actually missing…
pip install -U mistral_common sentencepiece
import transformers, sys
print("transformers", transformers.__version__)
try:
from transformers.models.mistral.modeling_mistral import MistralForCausalLM
print("MistralForCausalLM OK")
except Exception as e:
print("MistralForCausalLM FAIL:", e, file=sys.stderr)
@John6666 getting this when I run that code snippet
``
MistralForCausalLM FAIL: partially initialized module ‘torchvision’ has no attribute ‘extension’ (most likely due to a circular import)
```
Judging just by the error, it’s probably a version mismatch between torch
and torchvision
.
pip install torchvision==x.xx.x
Domain Version Compatibility Matrix for PyTorch
@John6666 thanks! yes, aligning the versions helped
I have fine-tuned the model and now running into this run-time error while loading it:
RuntimeError: Error(s) in loading state_dict for Embedding:
size mismatch for weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([131072, 5120]).
Any idea what might be causing this?
Based on the error message, I’d guess it’s either trying to load the PEFT adapter as a whole model weight or the model weights are corrupted…
@John6666 could this be because of deepspeed? when I do len(tokenizer)
it prints 131072
.
could this be because of deepspeed
I think very likely…
When saving fails in DeepSpeed, it appears an empty tensor is saved instead.
@John6666 I’m using "stage3_gather_16bit_weights_on_model_save": true
as suggested here. Not sure what else is causing this.
This may also occur when using BF16 or when using older version of PEFT.
pip install -U peft
@John6666 using model.save_16bit_model()
to save the model insread of save_pretrained()
fixed this!
This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.