Quantization of facebook/opt-13b model

kargarisaac · July 28, 2022, 11:39am

Hi,
I have a finetuned model of facebook/opt-13b model locally. I want to get a quantized model to shrink the model size and have faster inference. I did the conversion to onnx (after spending a lot of time). I found one PR here. The code works for all opt versions except the opt-13b that I need. So I had some discussion here and made some changes to finally quantize the model. But the quantized model output is not good at all. I asked here few days ago but didn’t get any answer.

The code:

from onnxruntime.quantization.calibrate import CalibrationMethod
from onnxruntime.quantization import quantize_dynamic, QuantType, quantize_static

quantize_dynamic(
    "onnx2/model.onnx",
    "onnx-quantized2/model-int8.onnx",
    weight_type=QuantType.QUInt8,
    use_external_data_format=True
)

The quantized model output :

This is an award winning short story titled The Drive. This story titled The A
 A story  A A  A A A
 A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A

It should be sth like this:

This is an award winning short story titled The Drive. This story is written with descriptive language, described in detail. This is the first chapter of The Drive.\n\nIn this chapter I am a professional driver. I am driving a car from San Francisco to Los Angeles. I have a female passenger who is a famous photographer. She is taking a photo of me as I drive.\n\n#action, #driving, #fiction, #funny, #funny, #driving,'

What do you think is the problem?

Topic		Replies	Views
Improving Quantization Accuracy for ONNX Models with Optimum 🤗Optimum	0	725	February 8, 2024
Dynamic quantization problems 🤗Optimum	4	2234	October 16, 2022
Onnx export functionality failure for facebook/opt-2.7b with optimum CLI 🤗Transformers	0	336	October 11, 2023
Quantized Model size difference when using Optimum vs. Onnxruntime 🤗Optimum	3	1522	July 14, 2022
Help with Quantizing phi-4 MM Fine-Tuned Vision Model and Converting to ONNX Intermediate	3	71	May 2, 2025

Quantization of facebook/opt-13b model

Related topics