Failed to save model and tokenizer with .save_pretrained()

Hi there,

I’m a kind of beginner with Azure Databricks. I have a Standard_NC4as_T4_v3 singlenode compute engine with Databricks. I successfully loaded a model from the hub with .from_pretrained() and did some inference.
While using the .save_pretrained() method to store the model, e.g. “model.save_pretrained(”/Workspace/…/model_folder/“)” I get the error message “SafetensorError: Error while serializing: IoError(Os { code: 27, kind: FileTooLarge, message: “File too large” })”.
When I additionally use the parameter “max_shard_size=xxxMB” it saves only the last chunk of the tensors.

Any ideas?

Thanks and cheers,
Ingo

1 Like

Did you happen to get an answer to this? I am trying to save to an ML Flow run in databricks and I am getting this message.

1 Like

Hi Ingo, avoid saving models in the Workspace since they have a 500 MB limit and it will eventually result in issues when using Git Folders.. Instead, save them to DBFS or mounted storage, by running model.save_model(“/dbfs/…”). Hope this helps:)

1 Like

Hi, thanks. I got it running by saving it to a volume mount in my unity catalog. Best regards, Ingo

1 Like