How to configure ONNX models from Hugging Face to use model options in C++?

In the Python implementation of ONNX, there is code like the following for HuggingFace models (example: Whisper):

import numpy as np
import onnxruntime
from onnxruntime_extensions import get_library_path

audio_file = "audio.mp3"
model = "whisper-tiny-en-all-int8.onnx" # Generated via Optimum
with open(audio_file, "rb") as f:
    audio = np.asarray(list(, dtype=np.uint8)

inputs = {
    "audio_stream": np.array([audio]),
    "max_length": np.array([30], dtype=np.int32),
    "min_length": np.array([1], dtype=np.int32),
    "num_beams": np.array([5], dtype=np.int32),
    "num_return_sequences": np.array([1], dtype=np.int32),
    "length_penalty": np.array([1.0], dtype=np.float32),
    "repetition_penalty": np.array([1.0], dtype=np.float32),
    "attention_mask": np.zeros((1, 80, 3000), dtype=np.int32),

options = onnxruntime.SessionOptions()
session = onnxruntime.InferenceSession(model, options, providers=["CPUExecutionProvider"])
outputs =, inputs)[0]

We are able to pass a dictionary inputs that provides information about how the model should be evaluated.

However, I’m not sure how to do this in C++. I checked the Run API and I don’t really see anything like this. How does one do this in C++? Do we have to define it through AddConfigEntry in RunOptions?