google/mt5-xl
Sorry. That’s the general rule, but in this case, it seems to be a problem with the model. A 15GB file is stored without being split…
Recently, files are often saved in split form, which is more convenient when loading large models. The quickest solution would be to save it again yourself. You can either upload it or store it somewhere in advance and load it onto the GPU.
from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained("google/mt5-xl")
model.save_pretrained("mt5-xl-sft-2gb", safe_serialization=True, max_shard_size="2GB") # or "5GB", "10GB", etc.
#model.push_to_hub("mt5-xl-sft-2gb", safe_serialization=True, max_shard_size="2GB") # if uploading to Hugging Face Hub directly