In the Python implementation of ONNX, there is code like the following for HuggingFace models (example: Whisper):
import numpy as np
import onnxruntime
from onnxruntime_extensions import get_library_path
audio_file = "audio.mp3"
model = "whisper-tiny-en-all-int8.onnx" # Generated via Optimum
with open(audio_file, "rb") as f:
audio = np.asarray(list(f.read()), dtype=np.uint8)
inputs = {
"audio_stream": np.array([audio]),
"max_length": np.array([30], dtype=np.int32),
"min_length": np.array([1], dtype=np.int32),
"num_beams": np.array([5], dtype=np.int32),
"num_return_sequences": np.array([1], dtype=np.int32),
"length_penalty": np.array([1.0], dtype=np.float32),
"repetition_penalty": np.array([1.0], dtype=np.float32),
"attention_mask": np.zeros((1, 80, 3000), dtype=np.int32),
}
options = onnxruntime.SessionOptions()
options.register_custom_ops_library(get_library_path())
session = onnxruntime.InferenceSession(model, options, providers=["CPUExecutionProvider"])
outputs = session.run(None, inputs)[0]
We are able to pass a dictionary inputs
that provides information about how the model should be evaluated.
However, I’m not sure how to do this in C++. I checked the Run
API and I don’t really see anything like this. How does one do this in C++? Do we have to define it through AddConfigEntry
in RunOptions
?