Exporting model wav2vec2 not supported?

Hi all, I’m trying to convert model nguyenvulebinh/wav2vec2-base-vietnamese-250h to onnx
which is a speech to text wav2tovec2 something.

from pathlib import Path
import transformers
from transformers.onnx import FeaturesManager
from transformers import AutoConfig, AutoTokenizer, AutoModelForAudioClassification

load model and tokenizer

model_id = “nguyenvulebinh/wav2vec2-base-vietnamese-250h”
feature = “audio-classification”
model = AutoModelForAudioClassification.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)

load config

model_kind, model_onnx_config = FeaturesManager.check_supported_model_or_raise(model, feature=feature)
onnx_config = model_onnx_config(model.config)

export

onnx_inputs, onnx_outputs = transformers.onnx.export(
preprocessor=tokenizer,
model=model,
config=onnx_config,
opset=13,
output=Path(“model.onnx”)
)

and i got error:
Exception has occurred: KeyError

“wav2vec2 is not supported yet. Only [‘albert’, ‘bart’, ‘beit’, ‘bert’, ‘big-bird’, ‘bigbird-pegasus’, ‘blenderbot’, ‘blenderbot-small’, ‘bloom’, ‘camembert’, ‘clip’, ‘codegen’, ‘convbert’, ‘convnext’, ‘data2vec-text’, ‘data2vec-vision’, ‘deberta’, ‘deberta-v2’, ‘deit’, ‘detr’, ‘distilbert’, ‘electra’, ‘flaubert’, ‘gpt2’, ‘gptj’, ‘gpt-neo’, ‘groupvit’, ‘ibert’, ‘imagegpt’, ‘layoutlm’, ‘layoutlmv3’, ‘levit’, ‘longt5’, ‘longformer’, ‘marian’, ‘mbart’, ‘mobilebert’, ‘mobilenet-v1’, ‘mobilenet-v2’, ‘mobilevit’, ‘mt5’, ‘m2m-100’, ‘owlvit’, ‘perceiver’, ‘poolformer’, ‘rembert’, ‘resnet’, ‘roberta’, ‘roformer’, ‘segformer’, ‘squeezebert’, ‘swin’, ‘t5’, ‘vision-encoder-decoder’, ‘vit’, ‘whisper’, ‘xlm’, ‘xlm-roberta’, ‘yolos’] are supported. If you want to support wav2vec2 please propose a PR or open up an issue.”

seems like its not yet supported. is there a way to request it?

I also tried to generate onnx model using cli

optimum-cli export onnx --model nguyenvulebinh/wav2vec2-base-vietnamese-250h onnxOptimum/
Framework not specified. Using pt to export to ONNX.
/home/ace/.local/lib/python3.10/site-packages/transformers/configuration_utils.py:380: UserWarning: Passing gradient_checkpointing to a config initialization is deprecated and will be removed in v5 Transformers. Using model.gradient_checkpointing_enable() instead, or if you are using the Trainer API, pass gradient_checkpointing=True in your TrainingArguments.
warnings.warn(
Automatic task detection to automatic-speech-recognition (possible synonyms are: audio-ctc, speech2seq-lm).
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Using framework PyTorch: 2.0.1+cu117
/home/ace/.local/lib/python3.10/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py:595: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/home/ace/.local/lib/python3.10/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py:634: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
============= Diagnostic Run torch.onnx.export version 2.0.1+cu117 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

Post-processing the exported models…
Validating ONNX model onnxOptimum/model.onnx…
-[✓] ONNX model output names match reference model (logits)
- Validating ONNX Model output “logits”:
-[✓] (2, 49, 110) matches (2, 49, 110)
- values not close enough, max diff: 4.9054622650146484e-05 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:

  • logits: max diff = 4.9054622650146484e-05.
    The exported model was saved at: onnxOptimum

- values not close enough, max diff: 4.9054622650146484e-05 (atol: 1e-05) – would it mean this generated model would also not work ? (it was an X not a tick)

Hi @xieu90, you got this error because you used transformers to convert your model to ONNX. The Transformers API to convert models to ONNX is deprecated and not maintained anymore, and we recommend using Optimum instead :slight_smile:

So you did well using Optimum CLI! Regarding the warning you got, the CLI runs a validation step by default after exporting the model. It seems that one test led to a slightly higher difference than expected, but 4.9e-05 still sounds acceptable so I think your ONNX model will work fine.

thank you regisss
today i tried to export with same cli,
~/.local/bin/optimum-cli export onnx --model nguyenvulebinh/wav2vec2-base-vietnamese-250h onnxOptimum
first time it was 4 like before. second time it was about 3,7, and third time somehow i got this
Post-processing the exported models…
Validating ONNX model onnxOptimum/model.onnx…
-[✓] ONNX model output names match reference model (logits)
- Validating ONNX Model output “logits”:
-[✓] (2, 49, 110) matches (2, 49, 110)
- values not close enough, max diff: 0.00010091066360473633 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:

  • logits: max diff = 0.00010091066360473633.
    The exported model was saved at: onnxOptimum

which one would be best then ? ^^

@xieu90 The exported model is the same each time. However, validation is performed with random values which is why you see different results. But differences are small enough to say that the exported model seems to behave well :slight_smile:

1 Like