Openai/whisper-large-v3 ONNX validation

My spec

  • optimum version: 1.16.1
  • transformers version: 4.37.0.dev0
  • Platform: macOS-14.1-arm64-arm-64bit
  • Python version: 3.11.5
  • Huggingface_hub version: 0.19.4
  • PyTorch version (GPU?): 2.1.2 (cuda available: False)
  • Tensorflow version (GPU?): ‘not installed’ (cuda availabe: ‘NA’)

Converting openai/whisper-large-v3 model to ONNX outputs warnings about values not close enough during the ONNX validation step. I’m able to obtain normal outputs for my usage with the result but do you think this is expected and normal?

The output is

Post-processing the exported models…
Deduplicating shared (tied) weights…
Found different candidate ONNX initializers (likely duplicate) for the tied weights:
model.decoder.embed_tokens.weight: {‘model.decoder.embed_tokens.weight’}
proj_out.weight: {‘onnx::MatMul_11581’}
Removing duplicate initializer onnx::MatMul_11581…
Validating ONNX model onnx/out/encoder_model.onnx…
-[✓] ONNX model output names match reference model (last_hidden_state)

  • Validating ONNX Model output “last_hidden_state”:
    -[✓] (2, 1500, 1280) matches (2, 1500, 1280)
    - values not close enough, max diff: 0.006580352783203125 (atol: 0.001)
    Validating ONNX model onnx/out/decoder_model.onnx…
    -[✓] ONNX model output names match reference model (logits)
  • Validating ONNX Model output “logits”:
    -[✓] (2, 16, 51866) matches (2, 16, 51866)
    -[✓] all values close (atol: 0.001)
    The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 0.001:
  • last_hidden_state: max diff = 0.006580352783203125.
    The exported model was saved at: onnx/out

Hi,

This is fine. As long as you get all green checks [✓] for the default absolute tolerance, then it means that the conversion went fine.

Sorry, it copied as ‘success’ checkmark but in fact, it was a cross , here is a screenshot:

And the code: (I am trying with the transformers.optimum main_export way)

from transformers import WhisperConfig
from optimum.exporters.onnx import main_export
from optimum.exporters.onnx.model_configs import WhisperOnnxConfig

model_id = "openai/whisper-large-v3"

print("Exporting model as ONNX")

config = WhisperConfig.from_pretrained(model_id)
onnx_config = WhisperOnnxConfig(config, task="automatic-speech-recognition")

encoder_config = onnx_config.with_behavior("encoder")
decoder_config = onnx_config.with_behavior("decoder")

custom_onnx_configs={
    "encoder_model": encoder_config,
    "decoder_model": decoder_config
}

main_export(
    model_id,
    output="onnx/out",
    task="automatic-speech-recognition",
    custom_onnx_configs=custom_onnx_configs
)

I tried to investigate/debug trace the library code but it gets beyond my ability to reason, especially because I’m new to Optimum, ONNX runtime and have basic skills in Python.

Sometimes, I’ve observed ‘atol’ diffs up to 20x the threshold of 0.001