Exporting UDOP to ONNX fails

When trying to export the UDOP model to ONNX using torch.onnx.export(), the export fails. Below is the output from running transformers-cli env:

  • transformers version: 4.39.3
  • Platform: Linux-5.4.0-1072-aws-x86_64-with-glibc2.29
  • Python version: 3.10.13
  • Huggingface_hub version: 0.21.4
  • Safetensors version: 0.4.2
  • Accelerate version: 0.28.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.2.2+cu118 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?:No

I have included here a minimal code example to reproduce the issue:

from PIL import Image
from torch.onnx import export
from transformers import (
    UdopEncoderModel,
    UdopProcessor,
)

# Load UDOP processor
udop_processor = UdopProcessor.from_pretrained("microsoft/udop-large", apply_ocr=False)
# Create dummy input for ONNX export
dummy_encodings = udop_processor(
    images=Image.new(mode='RGB', size=(224, 224)),
    text=['dummy', 'text'],
    boxes=[[0, 0, 100, 100], [750, 750, 850, 850]],
    return_tensors="pt",
    max_length=512,
    padding="max_length",
)
# Load base UDOP model from HuggingFace hub
udop_model = UdopEncoderModel.from_pretrained("microsoft/udop-large")
# Export model to ONNX using torch.onnx.export()
export(
    model=udop_model,
    args=(
        dummy_encodings.input_ids,
        dummy_encodings.bbox,
        dummy_encodings.attention_mask,
        dummy_encodings.pixel_values,
    ),
    f='model.onnx',
)

Running this example produces the following error:

/venv/lib/python3.10/site-packages/transformers/models/udop/modeling_udop.py:399: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if height != self.image_size[0] or width != self.image_size[1]:
/venv/lib/python3.10/site-packages/transformers/models/udop/modeling_udop.py:339: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  torch.arange(len(ocr_points))[:, None].repeat(1, ocr_points.size(-1))[:, :, None].to(ocr_points),
/venv/lib/python3.10/site-packages/transformers/models/udop/modeling_udop.py:345: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  rows, cols = zip(*ind)
/venv/lib/python3.10/site-packages/transformers/models/udop/modeling_udop.py:345: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  rows, cols = zip(*ind)
/venv/lib/python3.10/site-packages/transformers/models/udop/modeling_udop.py:348: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  input_vision_patches = [image_embeddings[i][patch_inds[i]] for i in range(len(patch_inds))]
/venv/lib/python3.10/site-packages/transformers/models/udop/modeling_udop.py:355: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  visual_bbox = [visual_bbox[i][patch_inds[i]] for i in range(len(patch_inds))]
/venv/lib/python3.10/site-packages/transformers/models/udop/modeling_udop.py:357: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  visual_attention_mask = [torch.tensor([1] * len(item)).to(attention_mask) for item in visual_bbox]
/venv/lib/python3.10/site-packages/transformers/models/udop/modeling_udop.py:357: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  visual_attention_mask = [torch.tensor([1] * len(item)).to(attention_mask) for item in visual_bbox]
/venv/lib/python3.10/site-packages/transformers/models/udop/modeling_udop.py:293: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if m > 0:
Traceback (most recent call last):
  File "src/pd_ai_data_extractor/scripts/udop_to_onnx.py", line 24, in <module>
    export(
  File "/venv/lib/python3.10/site-packages/torch/onnx/utils.py", line 516, in export
    _export(
  File "/venv/lib/python3.10/site-packages/torch/onnx/utils.py", line 1613, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/venv/lib/python3.10/site-packages/torch/onnx/utils.py", line 1135, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/venv/lib/python3.10/site-packages/torch/onnx/utils.py", line 1011, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/venv/lib/python3.10/site-packages/torch/onnx/utils.py", line 915, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
  File "/venv/lib/python3.10/site-packages/torch/jit/_trace.py", line 1296, in _get_trace_graph
    outs = ONNXTracedModule(
  File "/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/venv/lib/python3.10/site-packages/torch/jit/_trace.py", line 138, in forward
    graph, out = torch._C._create_graph_by_tracing(
RuntimeError: 0 INTERNAL ASSERT FAILED at "../torch/csrc/jit/ir/alias_analysis.cpp":615, please report a bug to PyTorch. We don't have an op for aten::full_like but it isn't a special case.  Argument types: Tensor, bool, NoneType, NoneType, NoneType, bool, NoneType,

Candidates:
        aten::full_like(Tensor self, Scalar fill_value, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
        aten::full_like.out(Tensor self, Scalar fill_value, *, MemoryFormat? memory_format=None, Tensor(a!) out) -> Tensor(a!)

I have tracked this issue down to line 336 of transformers/models/udop/modeling_udop.py:

patch_inds = torch.full_like(image_embeddings[:, :, 0], True).bool()

This issue can be corrected by replacing this line with:

patch_inds = torch.full_like(image_embeddings[:, :, 0], 1).bool()

However, after making that change, there is a new issue with the following traceback:

/venv/lib/python3.10/site-packages/transformers/models/udop/modeling_udop.py:399: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if height != self.image_size[0] or width != self.image_size[1]:
/venv/lib/python3.10/site-packages/transformers/models/udop/modeling_udop.py:339: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  torch.arange(len(ocr_points))[:, None].repeat(1, ocr_points.size(-1))[:, :, None].to(ocr_points),
/venv/lib/python3.10/site-packages/transformers/models/udop/modeling_udop.py:345: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  rows, cols = zip(*ind)
/venv/lib/python3.10/site-packages/transformers/models/udop/modeling_udop.py:345: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  rows, cols = zip(*ind)
/venv/lib/python3.10/site-packages/transformers/models/udop/modeling_udop.py:348: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  input_vision_patches = [image_embeddings[i][patch_inds[i]] for i in range(len(patch_inds))]
/venv/lib/python3.10/site-packages/transformers/models/udop/modeling_udop.py:355: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  visual_bbox = [visual_bbox[i][patch_inds[i]] for i in range(len(patch_inds))]
/venv/lib/python3.10/site-packages/transformers/models/udop/modeling_udop.py:357: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  visual_attention_mask = [torch.tensor([1] * len(item)).to(attention_mask) for item in visual_bbox]
/venv/lib/python3.10/site-packages/transformers/models/udop/modeling_udop.py:357: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  visual_attention_mask = [torch.tensor([1] * len(item)).to(attention_mask) for item in visual_bbox]
/venv/lib/python3.10/site-packages/transformers/models/udop/modeling_udop.py:293: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if m > 0:
/venv/lib/python3.10/site-packages/torch/onnx/utils.py:1703: UserWarning: The exported ONNX model failed ONNX shape inference. The model will not be executable by the ONNX Runtime. If this is unintended and you believe there is a bug, please report an issue at https://github.com/pytorch/pytorch/issues. Error reported by strict ONNX shape inference: [ShapeInferenceError] Inference error(s): (op_type:Add, node name: /encoder/Add_3): [ShapeInferenceError] Incompatible dimensions
(op_type:Concat, node name: /encoder/Concat_9): [TypeInferenceError] Input 0 expected to have type but instead is null
(op_type:Add, node name: /encoder/Add_4): [TypeInferenceError] Input 0 expected to have type but instead is null
 (Triggered internally at ../torch/csrc/jit/serialization/export.cpp:1484.)
  _C._check_onnx_proto(proto)

I have not been able to track down the cause of this new issue so any help with this would be much appreciated.

I addition submitted a bug report to PyTorch at ONNX export fails for aten::full_like op when exporting UDOP model from transformers · Issue #122898 · pytorch/pytorch · GitHub. There are some suggestions in that issue but none that resolved the underlying bug since it seems to be an issue with the UDOP implementation in transformers.