Hi, I am currently interested in quantizing Stable Diffusion model and running it as an ONNX model on a device.
I have successfully quantized the model by following the Static Quantization Example of Optimum here. However, I found that the model using PyTorch’s FX Quantization API gives better image quality. I tried several calibration config options but there was no big change. I want to use Optimum’s API to generate a quantized model with better accuracy because it retains layer names within the model and offers usable quantize options and superior maintainability.
I would appreciate any advice on how to improve the accuracy of this quantized model using Optimum or any insights on the differences of the quantization methods between Optimum and PyTorch FX.
Below is a snippet of the script I am using:
quantizer = ORTQuantizer.from_pretrained(f"{args.in_model_dir}/unet")
qconfig = AutoQuantizationConfig.arm64(is_static=True, per_channel=False, nodes_to_exclude=UNET_NODES_TO_EXCLUDE)
qconfig.weights_dtype = QuantType.QInt8
qconfig.activations_dtype = QuantType.QInt8
# Create the calibration configuration containing the parameters related to calibration.
calibration_dataset = quantizer.get_calibration_dataset(
dataset_name=dataset_path,
preprocess_function=partial(preprocess_unet),
num_samples=dataset_num,
dataset_split="train",
preprocess_batch=False
)
# Perform the calibration step: computes the activations quantization ranges
calibration_config = AutoCalibrationConfig.minmax(calibration_dataset, moving_average=True, averaging_constant=0.01)
# Apply static quantization on the model
ranges = quantizer.fit(
dataset=calibration_dataset,
calibration_config=calibration_config,
operators_to_quantize=qconfig.operators_to_quantize,
use_external_data_format=True,
)
# Apply static quantization on the model
model_quantized_path = quantizer.quantize(
save_dir=args.out_model_dir+'/unet_quantized',
calibration_tensors_range=ranges,
quantization_config=qconfig,
)
Thank you.