How to quantize and run inference for CLIP using optimum

kechan · May 31, 2024, 4:38pm

I exported a clip model to ONNX using:

optimum-cli export onnx -m laion/CLIP-ViT-L-14-laion2B-s32B-b82K --framework pt clip_onnx

And I further tried to quantize it by:

optimum-cli onnxruntime quantize --onnx_model clip_onnx/ --arm64 -o quantized_clip_onnx

however, I am not sure how to run inference using my quantized_clip_onnx

Most code sample started from something like:

ORTModelForSequenceClassification.from_pretrained(…)

however, there doesnt seem to exist an ORTModel*** for CLIP related family of models. I am wondering if anyone has succeeded and point me out to a solution.

Thanks.

nielsr · June 3, 2024, 1:17pm

cc @merve who worked on Optimum

Topic		Replies	Views
Optimize AND quantize with Optimum 🤗Optimum	11	3333	February 10, 2024
Optimum & RoBERTa: how far can we trust a quantized model against its pytorch version? 🤗Optimum	10	2429	July 27, 2022
Optimisation and Quantization of Tensorflow Model 🤗Optimum	1	666	May 3, 2023
Quantized Model size difference when using Optimum vs. Onnxruntime 🤗Optimum	3	1540	July 14, 2022
Load pytorch trained model via optimum 🤗Optimum	5	2824	August 10, 2022

How to quantize and run inference for CLIP using optimum

Related topics