Why is uploaded model twice the size of actual model?

MoritzLaurer · June 6, 2022, 7:53pm

I’ve fine-tuned the DeBERTa-v3-large model on NLI data. The foundation model is normally 833 MB (see here). My fine-tuned model is, however, 1.62GB (see here).

I’m not sure why this is happening. And it’s not only an issue of DeBERTa, I’ve noticed the same issue with other models before.

Is it maybe linked to the face that I trained with FP16 and this creates a copy of the model? If that’s the case, then there should be a way of getting the normal sized version of the model. Otherwise all future users need to download a model that’s twice the size that’s actually necessary.

julien-c · June 7, 2022, 9:29am

maybe for @patrickvonplaten @lysandre (size of FP16 checkpoints)?

sgugger · June 7, 2022, 11:47am

torch.load automatically converts a checkpoint in FP16 back to FP32. So unless you specific a torch_dtype when instantiating your model with from_pretrained, it was loaded in FP32.

The fine-tuning in mixed precision does not change that (we say mixed precision training on not half-precision training since some ops are still done in FP32, like the optimizer step and the model weights update), so it’s not suprising your fine-tuned model ended up in FP32 and twice the size.

MoritzLaurer · June 7, 2022, 5:37pm

Thanks for your response. So how do I avoid this behaviour in the future?

I looked into some old models and for DeBERTa-v3-base, the base model has the same size as my fine-tuned base model - I’m not sure what I did differently for these two.

In case it’s helpful, here is the process I used for loading and uploading:

model_name = "nli-scratch/best2-5e6lr-4ep"  # this is the path to a local folder with the pytorch model of size 800MB
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, model_max_length=512)
model = AutoModelForSequenceClassification.from_pretrained(model_name, label2id=label2id, id2label=id2label).to(device)

### Push to hub
#!huggingface-cli login

model.push_to_hub(repo_path_or_name="DeBERTa-v3-large-mnli-fever-anli-ling-wanli", use_temp_dir=True, use_auth_token="XXX")
tokenizer.push_to_hub(repo_path_or_name="DeBERTa-v3-large-mnli-fever-anli-ling-wanli", use_temp_dir=True, use_auth_token="XXX")

# ! the resulting uploaded model is 1.6GB, although the local model I loaded is 800MB

I don’t understand when the model I upload gets double the size and when it has normal size.

MoritzLaurer · June 10, 2022, 8:36am

I would be very grateful for advice on how to avoid this in the future, @sgugger

sgugger · June 10, 2022, 7:10pm

You can pass torch_dtype=torch.float16 to your call to from_pretrained to load the pretrained model in half precision, or convert your model to half precision before saving it.

MoritzLaurer · June 12, 2022, 8:01pm

great, this solved the issue!
I feel like this could even be the default when (up)loading models. I didn’t know about this / didn’t read it in the documentation and now there are several models which I uploaded and have been downloaded thousands of times, but are hundreds of MB too large causing unnecessary network traffic.
Or is there an important downside to always uploading in float16?

Just in case someone needs the updated code:

model = AutoModelForSequenceClassification.from_pretrained(model_name, label2id=label2id, id2label=id2label, torch_dtype=torch.float16).to(device)
model.push_to_hub(repo_path_or_name="DeBERTa-v3-large-mnli-fever-anli-ling-wanli", use_temp_dir=False, use_auth_token="XXX")

Topic		Replies	Views
Model size doubles after finetuning Models	0	481	May 11, 2022
Why is deberta-v3-large model twice as large on disk after MLM finetuning? (notebook to reproduce) 🤗Transformers	0	396	May 16, 2022
Unable to Finetune Deberta Intermediate	0	369	October 26, 2022
Incorrect model ``stas/tiny-wmt19-en-ru`` Models	1	313	May 3, 2021
Why my finetuned model size so small and unable to load Beginners	0	113	July 9, 2024

Why is uploaded model twice the size of actual model?

Related topics