Got `ONNXRuntimeError` when try to run BART in ONNX format #12851

Environment info

  • transformers version: 4.9.0
  • Platform: Linux-5.4.104±x86_64-with-Ubuntu-18.04-bionic
  • Python version: 3.7.11
  • PyTorch version (GPU?): 1.9.0+cu102 (True)
  • Using GPU in script?: Yes

I was using Google Colab and trying to export model facebook/bart-large-cnn to the onnx format. I ran the command python -m transformers.onnx -m=facebook/bart-large-cnn onnx/bart-large-cnn , and the outputs seem okay.

2021-07-22 23:14:33.821472: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Using framework PyTorch: 1.9.0+cu102
Overriding 1 configuration item(s)
	- use_cache -> False
/usr/local/lib/python3.7/dist-packages/transformers/models/bart/modeling_bart.py:212: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/usr/local/lib/python3.7/dist-packages/transformers/models/bart/modeling_bart.py:218: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, tgt_len, src_len):
/usr/local/lib/python3.7/dist-packages/transformers/models/bart/modeling_bart.py:249: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
/usr/local/lib/python3.7/dist-packages/transformers/models/bart/modeling_bart.py:863: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if input_shape[-1] > 1:
tcmalloc: large alloc 1625399296 bytes == 0x5595ce83a000 @  0x7f1780d9f887 0x7f177f695c29 0x7f177f696afb 0x7f177f696bb4 0x7f177f696f9c 0x7f17670dcbb7 0x7f17670dd064 0x7f175b75ba1c 0x7f176bf8eaff 0x7f176b949b88 0x55949fda8bf8 0x55949fe1c6f2 0x55949fe16c35 0x55949fda973a 0x55949fe1893b 0x55949fe16c35 0x55949fda973a 0x55949fe1bf40 0x55949fe16c35 0x55949fda973a 0x55949fe1893b 0x55949fda965a 0x55949fe17b0e 0x55949fda965a 0x55949fe17b0e 0x55949fe16c35 0x55949fe16933 0x55949fe14da0 0x55949fda7ea9 0x55949fda7da0 0x55949fe1bbb3
tcmalloc: large alloc 1625399296 bytes == 0x55962f654000 @  0x7f1780d9f887 0x7f177f695c29 0x7f177f696afb 0x7f177f696bb4 0x7f177f696f9c 0x7f17670dcbb7 0x7f17670dd064 0x7f175b75ba1c 0x7f176bf8ecab 0x7f176b949b88 0x55949fda8bf8 0x55949fe1c6f2 0x55949fe16c35 0x55949fda973a 0x55949fe1893b 0x55949fe16c35 0x55949fda973a 0x55949fe1bf40 0x55949fe16c35 0x55949fda973a 0x55949fe1893b 0x55949fda965a 0x55949fe17b0e 0x55949fda965a 0x55949fe17b0e 0x55949fe16c35 0x55949fe16933 0x55949fe14da0 0x55949fda7ea9 0x55949fda7da0 0x55949fe1bbb3
tcmalloc: large alloc 1625399296 bytes == 0x5595ce83a000 @  0x7f1780d9d1e7 0x55949fdd9a18 0x55949fda4987 0x7f176bf8ece2 0x7f176b949b88 0x55949fda8bf8 0x55949fe1c6f2 0x55949fe16c35 0x55949fda973a 0x55949fe1893b 0x55949fe16c35 0x55949fda973a 0x55949fe1bf40 0x55949fe16c35 0x55949fda973a 0x55949fe1893b 0x55949fda965a 0x55949fe17b0e 0x55949fda965a 0x55949fe17b0e 0x55949fe16c35 0x55949fe16933 0x55949fe14da0 0x55949fda7ea9 0x55949fda7da0 0x55949fe1bbb3 0x55949fe16c35 0x55949fda973a 0x55949fe17b0e 0x55949fe16c35 0x55949fce8eb1
tcmalloc: large alloc 1625399296 bytes == 0x55962f654000 @  0x7f1780d9f887 0x7f177f695c29 0x7f177f695d47 0x7f177f6977a5 0x7f176bd60368 0x7f176bfbc844 0x7f176b949b88 0x55949fda8010 0x55949fda7da0 0x55949fe1bbb3 0x55949fe16c35 0x55949fda973a 0x55949fe1893b 0x55949fe16c35 0x55949fda973a 0x55949fe1bf40 0x55949fe16c35 0x55949fda973a 0x55949fe1893b 0x55949fda965a 0x55949fe17b0e 0x55949fda965a 0x55949fe17b0e 0x55949fe16c35 0x55949fe16933 0x55949fe14da0 0x55949fda7ea9 0x55949fda7da0 0x55949fe1bbb3 0x55949fe16c35 0x55949fda973a
Validating ONNX model...
	-[âś“] ONNX model outputs' name match reference model ({'last_hidden_state', 'encoder_last_hidden_state'}
	- Validating ONNX Model output "last_hidden_state":
		-[âś“] (2, 8, 1024) matchs (2, 8, 1024)
		-[âś“] all values close (atol: 0.0001)
	- Validating ONNX Model output "encoder_last_hidden_state":
		-[âś“] (2, 8, 1024) matchs (2, 8, 1024)
		-[âś“] all values close (atol: 0.0001)
All good, model saved at: onnx/bart-large-cnn/model.onnx

Then I tried to execute the model in onnxruntime ,

import onnxruntime as ort

ort_session = ort.InferenceSession('onnx/bart-large-cnn/model.onnx')

# Got input_ids and attention_mask using tokenizer

outputs = ort_session.run(None, {'input_ids': input_ids.detach().cpu().numpy(), 'attention_mask': attention_mask.detach().cpu().numpy()})

And I got the error,

---------------------------------------------------------------------------
RuntimeException                          Traceback (most recent call last)
<ipython-input-30-380e6a0e1085> in <module>()
----> 1 outputs = ort_session.run(None, {'input_ids': input_ids.detach().cpu().numpy(), 'attention_mask': attention_mask.detach().cpu().numpy()})

/usr/local/lib/python3.7/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in run(self, output_names, input_feed, run_options)
    186             output_names = [output.name for output in self._outputs_meta]
    187         try:
--> 188             return self._sess.run(output_names, input_feed, run_options)
    189         except C.EPFail as err:
    190             if self._enable_fallback:

RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Reshape node. Name:'Reshape_109' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/tensor/reshape_helper.h:42 onnxruntime::ReshapeHelper::ReshapeHelper(const onnxruntime::TensorShape&, std::vector<long int>&, bool) gsl::narrow_cast<int64_t>(input_shape.Size()) == size was false. The input tensor cannot be reshaped to the requested shape. Input shape:{2}, requested shape:{1,1}

I see that BART is recently supported for ONNX in the latest release, but there isn’t any code to exactly explain how to run the inference in onnxruntime . Maybe I’m doing something wrong here, so any help will be appreciated!

1 Like

Hi @AlfredWGA,

Thanks for bringing this up to us.

As a sanity check, can you please provide the shape of input_ids and attention_mask? Also if you can share the ONNX Runtime version you’re using, that would be very helpful.

Thanks!

Thanks for replying!

All the code can be viewed in this notebook Google Colaboratory.

Hello,

I am running into the exact same issue. Has any solution been found yet ?

Sincerely