Optimize AND quantize with Optimum

Hi everyone,

I would like to optimize AND quantize a NER model I have fine-tuned. I see that with the Optimum library it is easy to do both things separately, but I haven’t still managed to optimize and quantize the same model. Back with ONNXRuntime it was possible and easy to do. How can this be done with Optimum?

Thanks!!

You can check out this blog post: Optimizing Transformers with Hugging Face Optimum
or the documentation: 🤗 Optimum

Hi Phil,

Thank you for your answer. The first blog post seems to solve my question. Anyways, I think it uses an old version of Optimum that has some attributes that actual ones don’t. When exporting the optimized/quantized model, in the blog post they use the export attribute. But, when I do the same (using version 1.4.0), it gives me an error saying 'ORTOptimizer' object has no attribute 'export'. I have searched in the documentation but I haven’t find anything similar to the article for more recent versions.

Thanks again!!

Take a look at the documentation there are the API changes included for version 1.4.0, e.g. for quantization here: Quantization

The problem I see with the latest version (1.4.0) compared to older versions is that it doesn’t have an export attribute implemented nor anything similar that could optimize a quantized model or quantize an optimized model.

As far as I understand, in version 1.4.0, it only allows you to go from model.onnx to model-optimized.onnx or from model.onnx to model-quantized.onnx, but then you can’t quantize model-optimized.onnx or optimize model-quantized.onnx to get a model-optimized-quantized.onnx, while in other versions, with export you could (such as in the blog post you linked).

Will be possible to do this in future versions? I think having a model quantized and optimized was such a good solution as it reduced a lot of space and achieved almost similar results as the original model.

Thanks!!

@jorgealro, that is already possible and supported you can provide the file_name when either loading a ORTModel or creating a Optimizer/Quantizer. This is explained and document in our documentation.

Yes, that works for me :slight_smile:

Thank you very much!!

1 Like