Hi, I have quantized the model flan-t5-base to bit8, and I am trying to upload it to huggingface but I keep getting this error:
ValueError: The model is quantized with QuantizationMethod.QUANTO and is not serializable - check out the warnings from the logger on the traceback to understand the reason why the quantized model is not serializable.
Here’s the code:
from transformers import T5Tokenizer, T5ForConditionalGeneration, QuantoConfig
quantization_config = QuantoConfig(weights="int8")
quantized_model = T5ForConditionalGeneration.from_pretrained(model_id, low_cpu_mem_usage=True, quantization_config=quantization_config)
quantized_model.push_to_hub("flan-t5-base-8bit")