How to configure ONNX models from Hugging Face to use model options in C++?

vctrymao · November 10, 2023, 4:24am

In the Python implementation of ONNX, there is code like the following for HuggingFace models (example: Whisper):

import numpy as np
import onnxruntime
from onnxruntime_extensions import get_library_path

audio_file = "audio.mp3"
model = "whisper-tiny-en-all-int8.onnx" # Generated via Optimum
with open(audio_file, "rb") as f:
    audio = np.asarray(list(f.read()), dtype=np.uint8)

inputs = {
    "audio_stream": np.array([audio]),
    "max_length": np.array([30], dtype=np.int32),
    "min_length": np.array([1], dtype=np.int32),
    "num_beams": np.array([5], dtype=np.int32),
    "num_return_sequences": np.array([1], dtype=np.int32),
    "length_penalty": np.array([1.0], dtype=np.float32),
    "repetition_penalty": np.array([1.0], dtype=np.float32),
    "attention_mask": np.zeros((1, 80, 3000), dtype=np.int32),
}

options = onnxruntime.SessionOptions()
options.register_custom_ops_library(get_library_path())
session = onnxruntime.InferenceSession(model, options, providers=["CPUExecutionProvider"])
outputs = session.run(None, inputs)[0]

We are able to pass a dictionary inputs that provides information about how the model should be evaluated.

However, I’m not sure how to do this in C++. I checked the Run API and I don’t really see anything like this. How does one do this in C++? Do we have to define it through AddConfigEntry in RunOptions?

Topic		Replies	Views
How to run whisper as onnx? Beginners	1	62	May 30, 2025
How to use export-onnx.py to change the pytorch_model.bin to onnx? Beginners	1	26	March 12, 2025
How to export MarianMT to ONNX with output_attentions=True? Beginners	1	282	October 9, 2024
Supporting ONNX optimized models 🤗Transformers	16	3041	September 1, 2021
Export pretrained MT5 model to ONNX 🤗Optimum	5	619	May 3, 2024

How to configure ONNX models from Hugging Face to use model options in C++?

Related topics