Why is uploaded model twice the size of actual model?

I’ve fine-tuned the DeBERTa-v3-large model on NLI data. The foundation model is normally 833 MB (see here). My fine-tuned model is, however, 1.62GB (see here).

I’m not sure why this is happening. And it’s not only an issue of DeBERTa, I’ve noticed the same issue with other models before.

Is it maybe linked to the face that I trained with FP16 and this creates a copy of the model? If that’s the case, then there should be a way of getting the normal sized version of the model. Otherwise all future users need to download a model that’s twice the size that’s actually necessary.

maybe for @patrickvonplaten @lysandre (size of FP16 checkpoints)?

1 Like

torch.load automatically converts a checkpoint in FP16 back to FP32. So unless you specific a torch_dtype when instantiating your model with from_pretrained, it was loaded in FP32.

The fine-tuning in mixed precision does not change that (we say mixed precision training on not half-precision training since some ops are still done in FP32, like the optimizer step and the model weights update), so it’s not suprising your fine-tuned model ended up in FP32 and twice the size.

Thanks for your response. So how do I avoid this behaviour in the future?

I looked into some old models and for DeBERTa-v3-base, the base model has the same size as my fine-tuned base model - I’m not sure what I did differently for these two.

In case it’s helpful, here is the process I used for loading and uploading:

model_name = "nli-scratch/best2-5e6lr-4ep"  # this is the path to a local folder with the pytorch model of size 800MB
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, model_max_length=512)
model = AutoModelForSequenceClassification.from_pretrained(model_name, label2id=label2id, id2label=id2label).to(device)

### Push to hub
#!huggingface-cli login

model.push_to_hub(repo_path_or_name="DeBERTa-v3-large-mnli-fever-anli-ling-wanli", use_temp_dir=True, use_auth_token="XXX")
tokenizer.push_to_hub(repo_path_or_name="DeBERTa-v3-large-mnli-fever-anli-ling-wanli", use_temp_dir=True, use_auth_token="XXX")

# ! the resulting uploaded model is 1.6GB, although the local model I loaded is 800MB

I don’t understand when the model I upload gets double the size and when it has normal size.

I would be very grateful for advice on how to avoid this in the future, @sgugger

You can pass torch_dtype=torch.float16 to your call to from_pretrained to load the pretrained model in half precision, or convert your model to half precision before saving it.

1 Like

great, this solved the issue!
I feel like this could even be the default when (up)loading models. I didn’t know about this / didn’t read it in the documentation and now there are several models which I uploaded and have been downloaded thousands of times, but are hundreds of MB too large causing unnecessary network traffic.
Or is there an important downside to always uploading in float16?

Just in case someone needs the updated code:

model = AutoModelForSequenceClassification.from_pretrained(model_name, label2id=label2id, id2label=id2label, torch_dtype=torch.float16).to(device)
model.push_to_hub(repo_path_or_name="DeBERTa-v3-large-mnli-fever-anli-ling-wanli", use_temp_dir=False, use_auth_token="XXX")