Pushing a quantized (4bit) model on the Hub

sooolee · June 6, 2023, 8:53am

Hi. I trained a model in 4bit. After the training, trying to push the model to the Hub, and I get this error message:

NotImplementedError: You are calling save_pretrained on a 4-bit converted model. This is currently not supported.

Although the error message says ‘save_pretrained’ my code is using ‘push_to_hub’:
base_model.push_to_hub(/)

When is this function be supported? Thank you very much!!

aplamhden · July 11, 2023, 10:43am

push_to_hub uses save_pretrained in order to save the model in physical format (i suspect) and then upload to cloud (HG hub)

sooolee · July 28, 2023, 5:58am

Thanks for the comment, but that still doesn’t solve the issue. I tried again, same error message. I’m not sure (and doubt) if 4-bit model is still not supported? If so, what am I doing wrong?

Can someone please provide answer or insight to this issue?

Thanks!

aplamhden · July 28, 2023, 6:13am

It is supported for sure. My problem was solved after I used peft 0.4.0 I think. Cheers

eloaf · July 28, 2023, 2:15pm

I have peft==0.4.0 and I get the same error message when trying to save a 4-bit converted model.

RonanMcGovern · July 28, 2023, 3:58pm

The docs aren’t entirely clear, but my read is that 8-bit is possible but 4-bit is not:

Note that once a model has been loaded in 4-bit it is currently not possible to push the quantized weights on the Hub. Note also that you cannot train 4-bit weights as this is not supported yet. However you can use 4-bit models to train extra parameters, this will be covered in the next section.

danielpark · October 7, 2023, 10:14am

Is this issue is closed???

I have a same issue with github version package. (peft and transfomers, etc.)

NotImplementedError: You are calling save_pretrained on a 4-bit converted model. This is currently not supported.

RonanMcGovern · January 8, 2024, 12:53pm

This issue can probably be closed.

Here’s the issue to follow on this:

github.com/TimDettmers/bitsandbytes

Save and load in NF4 / FP4 formats

TimDettmers:main ← poedator:save4

opened 08:29PM - 07 Sep 23 UTC

poedator

+376 -95

Purpose: enable saving and loading transformers models in 4bit formats. Enables… this PR in transformers: https://github.com/huggingface/transformers/pull/26037 addresses feature request #603 and other similar ones elsewhere. tested with Bloom-560 and Llama-2-7b. tested quantization, saving, loading, -- matched tensors and quant_states -- matched inference results. tested both nf4 and fp4 test added

nielsr · January 8, 2024, 2:49pm

This is now possible, see Github gist: push bnb 4 bit models on the hub · GitHub

Tweet: https://twitter.com/younesbelkada/status/1739244971905966380

RonanMcGovern · January 8, 2024, 6:28pm

Yeah, correct, with the caveat that you still need to install transformers from source as the latest stable release doesn’t support it. Hence why I tagged the issue above, cheers.

Topic		Replies	Views
Push 4-bit converted model to hub Models	2	2302	October 27, 2023
Problem saving QLORA fine tuned model Beginners	0	148	July 21, 2024
Model.save_pretrained is not saving .bin files! model.push_to_hub is not pushing my model in my HuggingFace directory! What am I missing? Help Beginners	11	4069	February 25, 2025
Error Debugging Beginners	1	21	April 29, 2025
Problem with pushing quantized model to hub 🤗Transformers	3	269	October 14, 2024

Pushing a quantized (4bit) model on the Hub

Related topics