Optimize AND quantize with Optimum

ddahlmeier · February 10, 2024, 9:12am

I managed to make the example from the blog Accelerated Inference with Optimum and Transformers Pipelines
work with the advise from this thread, but since this week the code breaks when quantizing the model
with this error

RuntimeError: Unable to find data type for weight_name='/roberta/encoder/layer.0/attention/output/dense/MatMul_output_0'

I have created a new post to explain the steps how to reproduce the error Optimum library optimization and quantization fails - #2 by ddahlmeier

Topic		Replies	Views
Transformers.onnx vs optimum.onnxruntime 🤗Optimum	1	1132	September 12, 2022
Quantized Model size difference when using Optimum vs. Onnxruntime 🤗Optimum	3	1522	July 14, 2022
Support for Mpnet models 🤗Optimum	2	820	August 8, 2022
Optimize an ONNX Seq2Seq model 🤗Optimum	3	1924	November 17, 2022
How can I export a transformers model into onnx that not supported with optimum yet 🤗Optimum	9	515	August 30, 2024