Hey there,
I am quantizing a model to 4-bit using BitsandBytes
, and when I try to push the model to the hub I am getting the following error:
You are calling `save_pretrained` on a 4-bit converted model. This is currently not supported
I’ve pushed a 4-bit converted model to the hub before after finetuning it with peft, however I was wondering whether I can do it without going through that path.
I know GPTQ
gives me more options, but it only works for text models.
Alternatively I can go through the finetuning and use methods like pruning, distillation to get the model smaller, but I am wondering if there is work around for pushing a 4-bit converted model to the hub
Following PRs do handle this issue^
TimDettmers:main
← poedator:save4
opened 08:29PM - 07 Sep 23 UTC
Purpose: enable saving and loading transformers models in 4bit formats.
Enables… this PR in transformers: https://github.com/huggingface/transformers/pull/26037
addresses feature request #603 and other similar ones elsewhere.
tested with Bloom-560 and Llama-2-7b.
tested quantization, saving, loading, -- matched tensors and quant_states -- matched inference results.
tested both nf4 and fp4
test added
huggingface:main
← poedator:save4
opened 07:37PM - 07 Sep 23 UTC
## What does this PR do?
Purpose: enable saving and loading transformers models… in 4bit formats.
tested with Bloom-560 and Llama-2-7b. Save - Load - match tensors and quant_states - match inference results.
## connection with bitsandbytes
Requires this PR in bitsandbytes: https://github.com/TimDettmers/bitsandbytes/pull/753 to be able to save/load models
## testing:
the functionality was tested doing this series of commands:
```
model = m4 = transformers.AutoModelForCausalLM.from_pretrained(..., [with 4bit quant config])
model.save_pretrained(SAVE_PATH, safe_serialization= [False / True] )
model2 = transformers.AutoModelForCausalLM.from_pretrained( SAVE_PATH)
# then matching all params and quant_state items between the models, matching inference results
```
Specific tests will be added to this PR once the [bitsandbytes PR](https://github.com/TimDettmers/bitsandbytes/pull/753) merges.
## Open questions to the maintainers:
1. I added and tested code necessary for straightforward save/load operations with LLMs. Yet there may be other kinds of function calls that may need to be updated to handle 4bit save/load - pls suggest if/where to expand this PR.
specifically: this PR covers `load_state_dict_into_meta_model()` but not `load_state_dict_into_model()` for I did not find an example that uses it.
2. Some of my edits may not fit with style / refactoring plans of the maintainers - pls give guidance if needed.
I will just wait until they are merged
1 Like
system
Closed
February 18, 2024, 10:42am
3
This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.