How to upload a quantized model?

ftian · June 15, 2021, 3:59am

@sgugger @lewtun thanks for the reply.

The problem is quantized weights is not enough for PyTorch INT8 inference. It’s a defect in PyTorch quantization implementation, which only allow on-the-fly quantization and on-the-fly inference (an intermediate python object “q_config” is generated in quantization and be used during inference. Note this q_config python object is not saved by PyTorch). If we would like to use this quantized model later or offline, we need load quantized weights and q_config of each node (this is not supported by PyTorch official).

This causes if we want to upload a quantized model to huggingface and user could use huggingface API to download/evaluate this model, we have to provide some codes which can read saved q_weights and q_config to generate a quantized model object and use it to do evaluation.

so it involves some code contributions, just want to confirm with you expertise if it’s a right direction before we put any resource on that.

possible code changes include:

model definition changes (adding quant/dequant stub for PyTorch imperative model and post-training-static-quantization). for example, introduces a q_bert class in huggingface repo.
the model returned from AutoModelForSequenceClassification.from_pretrained(’/path/to/quantized/pytorch/model_a’) should be able to take an additional parameter “q_config”.
if we want user be able to use pipeline(), then this func also need to take an additional parameter q_config.

Appreciate any guidence

Topic		Replies	Views
Problem with pushing quantized model to hub 🤗Transformers	3	313	October 14, 2024
ValueError: The model is quantized with QuantizationMethod.QUANTO and is not serializable 🤗Transformers	1	344	May 20, 2024
How to push on hub a quantized model 🤗Hub	0	92	July 13, 2024
Upload custom Llama2 model with injected linear layers 🤗Transformers	1	641	August 14, 2024
Pushing a quantized (4bit) model on the Hub 🤗Transformers	9	4253	January 8, 2024

How to upload a quantized model?

Related topics