Hi,
I try to export whisper-large model to ONNX. But I faced belowed error.
Fail: [ONNXRuntimeError] : 1 : FAIL : Deserialize tensor onnx::MatMul_8599 failed.tensorprotoutils.cc:637 TensorProtoToTensor External initializer: onnx::MatMul_8599 offset: 0 size to read: 26214400 given file_length: 6553600 are out of bounds or can not be read in full.
Before I faced error, I got belowed warning message:
if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
How can I convert openai/whisper-large model to ONNX format.
Hi @serdarcaglar , thank you for the report! Could you provide a reproducible command / code to make it easier for us to track the issue?
The ONNX export through transformers.onnx
will soon rely fully on Optimum Exporters (package for all things export).
Currently, using the stable optimum==1.5.1
, the export command python -m optimum.exporters.onnx --model openai/whisper-tiny whisper_tiny_onnx_vanilla
works well.
In the next release of Optimum (that you can hopefully expect sometime next week), the exporter will support exporting the encoder and decoder as two separate files, making it easier to use with ONNX Runtime:
python -m optimum.exporters.onnx --model openai/whisper-tiny --for-ort whisper_tiny_onnx
This will allow you to export your model, and load it directly from a local folder into ORTModelForSpeechSeq2Seq
.
Compare:
1 Like
The code I used for exporting:
from datasets import load_dataset
from transformers import AutoProcessor, pipeline
from optimum.onnxruntime import ORTModelForSpeechSeq2Seq
processor = AutoProcessor.from_pretrained("openai/whisper-large")
model = ORTModelForSpeechSeq2Seq.from_pretrained("openai/whisper-large", from_transformers=True)
speech_recognition_pipeline = pipeline(
"automatic-speech-recognition",
model=model,
feature_extractor=processor.feature_extractor,
tokenizer=processor.tokenizer,
)
Warning Messages:
/home/joseph/miniconda3/envs/ort-deploy/lib/python3.9/site-packages/transformers/models/whisper/modeling_whisper.py:200: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/home/joseph/miniconda3/envs/ort-deploy/lib/python3.9/site-packages/transformers/models/whisper/modeling_whisper.py:239: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
/home/joseph/miniconda3/envs/ort-deploy/lib/python3.9/site-packages/transformers/models/whisper/modeling_whisper.py:750: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if input_shape[-1] > 1:
/home/joseph/miniconda3/envs/ort-deploy/lib/python3.9/site-packages/transformers/models/whisper/modeling_whisper.py:74: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
mask = torch.full((tgt_len, tgt_len), torch.tensor(torch.finfo(dtype).min))
/home/joseph/miniconda3/envs/ort-deploy/lib/python3.9/site-packages/transformers/models/whisper/modeling_whisper.py:207: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attention_mask.size() != (bsz, 1, tgt_len, src_len):
/home/joseph/miniconda3/envs/ort-deploy/lib/python3.9/site-packages/transformers/models/whisper/modeling_whisper.py:79: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if past_key_values_length > 0:
ERROR:
Fail: [ONNXRuntimeError] : 1 : FAIL : Deserialize tensor onnx::MatMul_9737 failed.tensorprotoutils.cc:637 TensorProtoToTensor External initializer: onnx::MatMul_9737 offset: 0 size to read: 26214400 given file_length: 6553600 are out of bounds or can not be read in full.
Hi @serdarcaglar , currently this is an issue with models with external data format. We have a PR [255] open for the issue and the fix should be available soon . In the mean time you could run the above model by disabling the cache.
from datasets import load_dataset
from transformers import AutoProcessor, pipeline
from optimum.onnxruntime import ORTModelForSpeechSeq2Seq
processor = AutoProcessor.from_pretrained("openai/whisper-large")
model = ORTModelForSpeechSeq2Seq.from_pretrained("openai/whisper-large", from_transformers=True, use_cache=False)
speech_recognition_pipeline = pipeline(
"automatic-speech-recognition",
model=model,
feature_extractor=processor.feature_extractor,
tokenizer=processor.tokenizer,
)
1 Like