Optimize AND quantize with Optimum

Hi everyone,

I would like to optimize AND quantize a NER model I have fine-tuned. I see that with the Optimum library it is easy to do both things separately, but I haven’t still managed to optimize and quantize the same model. Back with ONNXRuntime it was possible and easy to do. How can this be done with Optimum?

Thanks!!

You can check out this blog post: Optimizing Transformers with Hugging Face Optimum
or the documentation: 🤗 Optimum

1 Like

Hi Phil,

Thank you for your answer. The first blog post seems to solve my question. Anyways, I think it uses an old version of Optimum that has some attributes that actual ones don’t. When exporting the optimized/quantized model, in the blog post they use the export attribute. But, when I do the same (using version 1.4.0), it gives me an error saying 'ORTOptimizer' object has no attribute 'export'. I have searched in the documentation but I haven’t find anything similar to the article for more recent versions.

Thanks again!!

Take a look at the documentation there are the API changes included for version 1.4.0, e.g. for quantization here: Quantization

The problem I see with the latest version (1.4.0) compared to older versions is that it doesn’t have an export attribute implemented nor anything similar that could optimize a quantized model or quantize an optimized model.

As far as I understand, in version 1.4.0, it only allows you to go from model.onnx to model-optimized.onnx or from model.onnx to model-quantized.onnx, but then you can’t quantize model-optimized.onnx or optimize model-quantized.onnx to get a model-optimized-quantized.onnx, while in other versions, with export you could (such as in the blog post you linked).

Will be possible to do this in future versions? I think having a model quantized and optimized was such a good solution as it reduced a lot of space and achieved almost similar results as the original model.

Thanks!!

@jorgealro, that is already possible and supported you can provide the file_name when either loading a ORTModel or creating a Optimizer/Quantizer. This is explained and document in our documentation.

Yes, that works for me :slight_smile:

Thank you very much!!

1 Like

Hi @jorgealro,
could you please explain how you solved the 'ORTOptimizer' object has no attribute 'export' issue? I’m facing the same problem trying to optimize a CrossEncoder with optimum 1.7.3.
Many thanks in advance!

I found the solution, it might be useful to someone until the documentation gets updated (currently available is for v1.3.0).
As of optimum==1.7.3, you should use the optimize method, instead of the export one:

optimizer = ORTOptimizer.from_pretrained('model_name_or_path')
optimizer.optimize(
    save_dir='output_folder'
)

Hi @f-dig, the documention seem to have moved: Optimization Sorry for the dead link!

Had the same problem… and a bunch of other issues from the tutorial: Accelerated Inference with Optimum and Transformers Pipelines…

Easy way to solve… delete all the argument, run, and use the logged comments (CLI messages) to find out the correct arguments.

I managed to make the example from the blog Accelerated Inference with Optimum and Transformers Pipelines
work with the advise from this thread, but since this week the code breaks when quantizing the model
with this error

RuntimeError: Unable to find data type for weight_name='/roberta/encoder/layer.0/attention/output/dense/MatMul_output_0'

I have created a new post to explain the steps how to reproduce the error Optimum library optimization and quantization fails - #2 by ddahlmeier