Optimum v1.1.0 breaking problems

Hello there! I’m using HF’s Optimum library, version 1.1.0 as it’s the latest release, and I’ve bumped into some problems. I’m only using the ONNX part of the library.

First of all , the docs seem to list 1.0.0 as the last stable release, if I’m not mistaken, and there are some minimal breaking changes, e.g. v.1.1.0’s “quantizer._onnx_config” instead of “quantizer.onnx_config” which is mentioned in the docs.

Second, I’m trying to save the ORTConfig to disk according to the docs/examples/notebooks , but when I’m calling ORTConfig.from_* (any source, json, dict, disk) the library doesn’t try to instantiate an Optimum config, instead it loads a transformers config which doesn’t recognise any optimum keys (“optimum_version”, “quantization”, “optimization” etc.) , so it’s impossible to load and run the quantized model as I can’t load its config.
My personal hotfix is to dump and load quantizer._onnx_config using pickle, which is not really useful for plenty of reasons.

Last (though this one is more of a classic Github Issue and I will post it there), there was no option to load a custom (not shared on the hub) Dataset that was saved to disk, as Optimum’s quantization.py includes a load_dataset() call which does not allow loading from disk. Replacing that line of code with a condition and load_from_disk if the dataset is not found in the Hub fixes that issue.

Has anyone encountered similar problems? Or has anyone managed to actually quantize with Optimum, save everything and load it from scratch in order to make predictions using HF’s stack (not directly ONNX using simply the exported model)

Hi @aneof

Thanks for reporting the issues you had been encountering.

The issue coming from the instantiation of a pretrained ORTConfig is fixed with PR#156.

Also the ORTConfig handles all the ONNX Runtime parameters (such as the optimization and quantization parameters).

The configuration you are mentionning with quantizer._onnx_config is actually the ONNX configuration, which describes the metadata on how to export the model through the ONNX format (see here for more details).

When doing inference, we shouldn’t need the latter and this will be solved soon with PR#113.

Finally concerning the calibration dataset, you don’t need to use the get_calibration_dataset method as you can load your dataset and then give it as an argument when instantiating your AutoCalibrationConfig, as done in the examples.