Optimize AND quantize with Optimum

jorgealro · September 27, 2022, 11:51am

Hi everyone,

I would like to optimize AND quantize a NER model I have fine-tuned. I see that with the Optimum library it is easy to do both things separately, but I haven’t still managed to optimize and quantize the same model. Back with ONNXRuntime it was possible and easy to do. How can this be done with Optimum?

Thanks!!

philschmid · September 27, 2022, 12:10pm

You can check out this blog post: Optimizing Transformers with Hugging Face Optimum
or the documentation: 🤗 Optimum

jorgealro · September 27, 2022, 2:20pm

Hi Phil,

Thank you for your answer. The first blog post seems to solve my question. Anyways, I think it uses an old version of Optimum that has some attributes that actual ones don’t. When exporting the optimized/quantized model, in the blog post they use the export attribute. But, when I do the same (using version 1.4.0), it gives me an error saying 'ORTOptimizer' object has no attribute 'export'. I have searched in the documentation but I haven’t find anything similar to the article for more recent versions.

Thanks again!!

philschmid · September 27, 2022, 6:17pm

Take a look at the documentation there are the API changes included for version 1.4.0, e.g. for quantization here: Quantization

jorgealro · September 28, 2022, 3:00pm

The problem I see with the latest version (1.4.0) compared to older versions is that it doesn’t have an export attribute implemented nor anything similar that could optimize a quantized model or quantize an optimized model.

As far as I understand, in version 1.4.0, it only allows you to go from model.onnx to model-optimized.onnx or from model.onnx to model-quantized.onnx, but then you can’t quantize model-optimized.onnx or optimize model-quantized.onnx to get a model-optimized-quantized.onnx, while in other versions, with export you could (such as in the blog post you linked).

Will be possible to do this in future versions? I think having a model quantized and optimized was such a good solution as it reduced a lot of space and achieved almost similar results as the original model.

Thanks!!

philschmid · September 28, 2022, 3:13pm

@jorgealro, that is already possible and supported you can provide the file_name when either loading a ORTModel or creating a Optimizer/Quantizer. This is explained and document in our documentation.

jorgealro · September 29, 2022, 3:50pm

Yes, that works for me

Thank you very much!!

f-dig · March 27, 2023, 10:27am

Hi @jorgealro,
could you please explain how you solved the 'ORTOptimizer' object has no attribute 'export' issue? I’m facing the same problem trying to optimize a CrossEncoder with optimum 1.7.3.
Many thanks in advance!

f-dig · March 27, 2023, 11:10am

I found the solution, it might be useful to someone until the documentation gets updated (currently available is for v1.3.0).
As of optimum==1.7.3, you should use the optimize method, instead of the export one:

optimizer = ORTOptimizer.from_pretrained('model_name_or_path')
optimizer.optimize(
    save_dir='output_folder'
)

fxmarty · March 27, 2023, 9:30pm

Hi @f-dig, the documention seem to have moved: Optimization Sorry for the dead link!

Gidz · August 27, 2023, 1:04pm

Had the same problem… and a bunch of other issues from the tutorial: Accelerated Inference with Optimum and Transformers Pipelines…

Easy way to solve… delete all the argument, run, and use the logged comments (CLI messages) to find out the correct arguments.

ddahlmeier · February 10, 2024, 9:12am

I managed to make the example from the blog Accelerated Inference with Optimum and Transformers Pipelines
work with the advise from this thread, but since this week the code breaks when quantizing the model
with this error

RuntimeError: Unable to find data type for weight_name='/roberta/encoder/layer.0/attention/output/dense/MatMul_output_0'

I have created a new post to explain the steps how to reproduce the error Optimum library optimization and quantization fails - #2 by ddahlmeier

Topic		Replies	Views
Transformers.onnx vs optimum.onnxruntime 🤗Optimum	1	1132	September 12, 2022
Quantized Model size difference when using Optimum vs. Onnxruntime 🤗Optimum	3	1522	July 14, 2022
Support for Mpnet models 🤗Optimum	2	820	August 8, 2022
Optimize an ONNX Seq2Seq model 🤗Optimum	3	1924	November 17, 2022
How can I export a transformers model into onnx that not supported with optimum yet 🤗Optimum	9	515	August 30, 2024

Optimize AND quantize with Optimum

Related topics