Improving Quantization Accuracy for ONNX Models with Optimum

mitsunami · February 8, 2024, 5:57pm

Hi, I am currently interested in quantizing Stable Diffusion model and running it as an ONNX model on a device.

I have successfully quantized the model by following the Static Quantization Example of Optimum here. However, I found that the model using PyTorch’s FX Quantization API gives better image quality. I tried several calibration config options but there was no big change. I want to use Optimum’s API to generate a quantized model with better accuracy because it retains layer names within the model and offers usable quantize options and superior maintainability.

I would appreciate any advice on how to improve the accuracy of this quantized model using Optimum or any insights on the differences of the quantization methods between Optimum and PyTorch FX.

Below is a snippet of the script I am using:

    quantizer = ORTQuantizer.from_pretrained(f"{args.in_model_dir}/unet")
    qconfig = AutoQuantizationConfig.arm64(is_static=True, per_channel=False, nodes_to_exclude=UNET_NODES_TO_EXCLUDE)
    qconfig.weights_dtype = QuantType.QInt8
    qconfig.activations_dtype = QuantType.QInt8

    # Create the calibration configuration containing the parameters related to calibration.
    calibration_dataset = quantizer.get_calibration_dataset(
        dataset_name=dataset_path,
        preprocess_function=partial(preprocess_unet),
        num_samples=dataset_num,
        dataset_split="train",
        preprocess_batch=False
    )

    # Perform the calibration step: computes the activations quantization ranges
    calibration_config = AutoCalibrationConfig.minmax(calibration_dataset, moving_average=True, averaging_constant=0.01)

    # Apply static quantization on the model
    ranges = quantizer.fit(
        dataset=calibration_dataset,
        calibration_config=calibration_config,
        operators_to_quantize=qconfig.operators_to_quantize,
        use_external_data_format=True,
    )

    # Apply static quantization on the model
    model_quantized_path = quantizer.quantize(
        save_dir=args.out_model_dir+'/unet_quantized',
        calibration_tensors_range=ranges,
        quantization_config=qconfig,
    )

Thank you.

Topic		Replies	Views
Dynamic quantization problems 🤗Optimum	4	2228	October 16, 2022
Optimum v1.1.0 breaking problems 🤗Optimum	1	1174	April 26, 2022
Quantization of facebook/opt-13b model 🤗Transformers	0	1000	July 28, 2022
Optimisation and Quantization of Tensorflow Model 🤗Optimum	1	658	May 3, 2023
Optimum & RoBERTa: how far can we trust a quantized model against its pytorch version? 🤗Optimum	10	2403	July 27, 2022

Improving Quantization Accuracy for ONNX Models with Optimum

Related topics