Getting ValueError when exporting model to ONNX using optimum

I’m trying to export my fine-tuned-bert-classifier torch model into ONNX format using optimum and then eventually want ro run it through a pipeline for sequence classification task.

Here is the code:

from optimum.onnxruntime import ORTModelForSequenceClassification, ORTOptimizer
from optimum.onnxruntime.configuration import OptimizationConfig
from optimum.pipelines import pipeline

# path_to_fine_tuned_model contains the path to the folder containing the pytorch_model.bin file
optimizer = ORTOptimizer.from_pretrained(path_to_fine_tuned_model, feature="sequence-classification") 
optimization_config = OptimizationConfig(optimization_level=2)

optimizer.export(
    onnx_model_path='../models/bert_model_opt.onnx',
    onnx_optimized_model_output_path='../models/bert_model_optimized.onnx',
    optimization_config=optimization_config,
)

However, this operation throws ValueError: Unable to generate dummy inputs for the model. Please provide a tokenizer or a preprocessor.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [35], in <cell line: 1>()
----> 1 optimizer.export(
      2     onnx_model_path='../models/bert_model_opt.onnx',
      3     onnx_optimized_model_output_path='../models/bert_model_optimized.onnx',
      4     optimization_config=optimization_config,
      5 )

File /opt/conda/envs/conda_ml/lib/python3.9/site-packages/optimum/onnxruntime/optimization.py:123, in ORTOptimizer.export(self, onnx_model_path, onnx_optimized_model_output_path, optimization_config, use_external_data_format)
    121 # Export the model if it has not already been exported to ONNX IR
    122 if not onnx_model_path.exists():
--> 123     export(self.preprocessor, self.model, self._onnx_config, self.opset, onnx_model_path)
    125 ORTConfigManager.check_supported_model_or_raise(self._model_type)
    126 num_heads = getattr(self.model.config, ORTConfigManager.get_num_heads_name(self._model_type))

File /opt/conda/envs/conda_ml/lib/python3.9/site-packages/transformers/onnx/convert.py:336, in export(preprocessor, model, config, opset, output, tokenizer, device)
    330         logger.warning(
    331             f"Unsupported PyTorch version for this model. Minimum required is {config.torch_onnx_minimum_version},"
    332             f" got: {torch_version}"
    333         )
    335 if is_torch_available() and issubclass(type(model), PreTrainedModel):
--> 336     return export_pytorch(preprocessor, model, config, opset, output, tokenizer=tokenizer, device=device)
    337 elif is_tf_available() and issubclass(type(model), TFPreTrainedModel):
    338     return export_tensorflow(preprocessor, model, config, opset, output, tokenizer=tokenizer)

File /opt/conda/envs/conda_ml/lib/python3.9/site-packages/transformers/onnx/convert.py:143, in export_pytorch(preprocessor, model, config, opset, output, tokenizer, device)
    139         setattr(model.config, override_config_key, override_config_value)
    141 # Ensure inputs match
    142 # TODO: Check when exporting QA we provide "is_pair=True"
--> 143 model_inputs = config.generate_dummy_inputs(preprocessor, framework=TensorType.PYTORCH)
    144 device = torch.device(device)
    145 if device.type == "cuda" and torch.cuda.is_available():

File /opt/conda/envs/conda_ml/lib/python3.9/site-packages/transformers/onnx/config.py:347, in OnnxConfig.generate_dummy_inputs(self, preprocessor, batch_size, seq_length, num_choices, is_pair, framework, num_channels, image_width, image_height, tokenizer)
    345     return dict(preprocessor(images=dummy_input, return_tensors=framework))
    346 else:
--> 347     raise ValueError(
    348         "Unable to generate dummy inputs for the model. Please provide a tokenizer or a preprocessor."
    349     )

ValueError: Unable to generate dummy inputs for the model. Please provide a tokenizer or a preprocessor.

Any idea how to fix this? Thanks!

Hi @AmoghM! Thanks for using Optimum.

According to the traceback you provided, no tokenizer or preprocessor was found. In the folder containing your pytorch_model.bin file, is there a JSON file that defines your tokenizer/preprocessor?

3 Likes

Thanks! The tokenizer files weren’t there an upon submitting it didn’t give the ValueError but now it is throwing this error message:

2022-08-17 21:30:22.151083668 [W:onnxruntime:, inference_session.cc:1546 Initialize] Serializing optimized model with Graph Optimization level greater than ORT_ENABLE_EXTENDED and the NchwcTransformer enabled. The generated model may contain hardware specific optimizations, and should only be used in the same environment the model was optimized in.
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
symbolic shape infer failed. it's safe to ignore this message if there is no issue with optimized model
failed in shape inference <class 'AssertionError'>
failed in shape inference <class 'AssertionError'>

However, what is even more weird is that even after the AssertionError, the models were exported. And the latency on inference for initial onnx model, optimized onnx model and quantized onnx model is similar. I was expecting it to reduce.

Which version of Optimum, ONNX and ONNXRuntime do you use?

Here it is:

onnxruntime-gpu==1.12.1
optimum==1.3.0
onnx==1.12.0

Hi @AmoghM, is your model accessible on the Hugging Face Hub so that I try on my side?

Hi any luck on this? I am also facing similar issue.

Hi @mineshj1291! Could you share the optimization config you used please? Also if you can tell me the model type and the task I’ll try it on my side.

I am trying for GPT2 model for causal-lm

I don’t manage to reproduce the issue. Could you try this code snippet or tell me how yours differs from this one?

from optimum.onnxruntime import ORTOptimizer, ORTModelForCausalLM
from optimum.onnxruntime.configuration import OptimizationConfig

model_id = "gpt2"
save_dir = "/tmp/outputs"

model = ORTModelForCausalLM.from_pretrained(model_id, from_transformers=True)

optimizer = ORTOptimizer.from_pretrained(model)

optimization_config = OptimizationConfig(optimization_level=2)

optimizer.optimize(save_dir=save_dir, optimization_config=optimization_config)

I ran it with the following versions:

optimum==1.4.1
onnxruntime==1.13.1
1 Like

Thanks, it worked. My optimum version was older.

Now the other issue I am facing is while using the optimized onnx.
It has about 1/3rd of the nodes running on CPU it is been a bottleneck for overall throughput.

This is how I am loading the model

model = ORTModelForCausalLM.from_pretrained(model_id=model_id, file_name="model_optimized.onnx", provider="CUDAExecutionProvider")
onnx_pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device=0, max_length=653)
output = model_pipeline(text_inputs=in_text)

Could you try after installing Optimum from source? You can do it as follows:

git clone https://github.com/huggingface/optimum.git
cd optimum/
pip install .

We recently added IOBinding in Optimum. A new version with this change will be released very soon but for now it is only available with a source install. You don’t have to change anything in your script, it should be used by default with the CUDAExecutionProvider.

1 Like

Thanks, It actually helped to improve the performance of onnx.

But it still has (15.6 secs ) a bit more (about 2 secs) latency compared to when I use the transformers model (13.5 secs per seq generation).
Also shows following warning:

 [W:onnxruntime:, session_state.cc:1030 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
 [W:onnxruntime:, session_state.cc:1032 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

Hey @mineshj1291,

When using transformers model, did you make use of past keys/values?

For decoder models, transformers model can pre-compute attention key/values as additional inputs for the next pass, which can avoid repeated computation and speed up sequential decoding.

Hi @mineshj1291, the reason why I asked is that the current ORTModelForCausalLM doesn’t have with_past support so it will recompute the attention for the past sequence during generation. If your PyTorch model makes use of the precomputation, it is not so fair to compare it with the current ORTModelForCausalLM which does extra computation.

Btw, if you are interested, you can follow the PR for adding with_past support to ORTModelForCausalLM. I will try to finish it this week.

1 Like

The PR for supporting use_cache + I/O binding support is merged. Here is a quick benchmark I’ve done with vanilla exported ONNX (no graph optimization nor quantization) running on a T4 GPU.

Please feel free to test it out, and tell us if the performance has been improved.

Thanks for sharing. Will try it.

1 Like