Issues when trying to save quantized model locally

tiago-machado · May 21, 2024, 1:55pm

Hello,

I have downloaded vicuna 7b (lmsys/vicuna-7b-v1.5 · Hugging Face), and am using HF library to run it locally. Everything runs fine until I need to save the model after quantizing it.

When I load (localy) my model this way, and try to save it locally:

from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer
import quanto
import torch
import time
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"

quantization_config = QuantoConfig(weights="int8")
tokenizer = AutoTokenizer.from_pretrained("vicuna-7b-v1.5")
model = AutoModelForCausalLM.from_pretrained(
	"vicuna-7b-v1.5", 
	torch_dtype=torch.float32,
	quantization_config=quantization_config, 
	low_cpu_mem_usage=True)
model.save_pretrained(name_to_save="./vicuna-7b-v1.5-quant-8bit")

It casts this error

ValueError: The model is quantized with QuantizationMethod.QUANTO and is not serializable

When I load (locally) and apply 8-bit quantization along the way

from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer
import quanto
import torch
import time
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"

tokenizer = AutoTokenizer.from_pretrained("vicuna-7b-v1.5")
model = AutoModelForCausalLM.from_pretrained("vicuna-7b-v1.5", low_cpu_mem_usage=True)
quanto.quantize(model, weights=quanto.qint8, activations=None)
quanto.freeze(model)
model.save_pretrained(name_to_save="./vicuna-7b-v1.5-quant-8bit")

I see this error when trying to save it:

ValueError: do_sample is set to False. However, temperature is set to 0.9 – this flag is only used in sample-based generation modes. Set do_sample=True or unset temperature to continue."

Doing what is saying in the messages do not help, and after a few hours around HF forums and stack overflow couldn’t find any solution as well. There are hints from some users, saying that downgrade transformers library fix the 2nd error, but it doesn’t work for me.

Topic		Replies	Views
Problem saving QLORA fine tuned model Beginners	0	150	July 21, 2024
Error saving quantized model Intermediate	4	3937	February 16, 2023
ImportError using AutoModelForCasualLM.from_pretrained Beginners	0	488	April 30, 2024
Loading a locally saved model is very slow 🤗Transformers	1	3737	July 10, 2024
ValueError: The model is quantized with QuantizationMethod.QUANTO and is not serializable 🤗Transformers	1	321	May 20, 2024

Issues when trying to save quantized model locally

Related topics