DeBERTaV3 ONNX conversion error

Hi :hugs:

I am trying to convert a finetuned DeBERTaV3 model into ONNX graph using the transformers.onnx python CLI. However, it is giving the following error.

On the github issue for ONNXConfig, it mentions that DeBERTaV2 has been already implemented. [github issue]

Is there a specific version of transformer that has this updated? or should I go ahead and make my own ONNXConfig for this (which seems like a pain honestly :sweat_smile:)

2022-07-21 10:21:22.414296: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Some weights of the model checkpoint at deberta-v3-large-conll-doccano/ were not used when initializing DebertaV2Model: ['classifier.bias', 'classifier.weight']
- This IS expected if you are initializing DebertaV2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaV2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.7/dist-packages/transformers/onnx/__main__.py", line 107, in <module>
    main()
  File "/usr/local/lib/python3.7/dist-packages/transformers/onnx/__main__.py", line 76, in main
    model_kind, model_onnx_config = FeaturesManager.check_supported_model_or_raise(model, feature=args.feature)
  File "/usr/local/lib/python3.7/dist-packages/transformers/onnx/features.py", line 519, in check_supported_model_or_raise
    model_features = FeaturesManager.get_supported_features_for_model_type(model_type, model_name=model_name)
  File "/usr/local/lib/python3.7/dist-packages/transformers/onnx/features.py", line 422, in get_supported_features_for_model_type
    f"{model_type_and_model_name} is not supported yet. "
KeyError: "deberta-v2 is not supported yet. Only ['albert', 'bart', 'beit', 'bert', 'big-bird', 'bigbird-pegasus', 'blenderbot', 'blenderbot-small', 'camembert', 'convbert', 'convnext', 'data2vec-text', 'deit', 'distilbert', 'electra', 'flaubert', 'gpt2', 'gptj', 'gpt-neo', 'ibert', 'layoutlm', 'longt5', 'marian', 'mbart', 'mobilebert', 'm2m-100', 'perceiver', 'resnet', 'roberta', 'roformer', 'squeezebert', 't5', 'vit', 'xlm', 'xlm-roberta'] are supported. If you want to support deberta-v2 please propose a PR or open up an issue."

Reference: [ Export :hugs: Transformers Models]

Hi! :hugs:

Update: I tried using the custom DeBERTaV2 ONNXConfig copied from this link on the HF github. This piece of code essentially looks like this:

class DebertaV2OnnxConfig(OnnxConfig):
    @property
    def inputs(self) -> Mapping[str, Mapping[int, str]]:
        if self.task == "multiple-choice":
            dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
        else:
            dynamic_axis = {0: "batch", 1: "sequence"}
        if self._config.type_vocab_size > 0:
            return OrderedDict(
                [("input_ids", dynamic_axis), ("attention_mask", dynamic_axis), ("token_type_ids", dynamic_axis)]
            )
        else:
            return OrderedDict([("input_ids", dynamic_axis), ("attention_mask", dynamic_axis)])

    @property
    def default_onnx_opset(self) -> int:
        return 12

    def generate_dummy_inputs(
        self,
        preprocessor: Union["PreTrainedTokenizerBase", "FeatureExtractionMixin"],
        batch_size: int = -1,
        seq_length: int = -1,
        num_choices: int = -1,
        is_pair: bool = False,
        framework: Optional["TensorType"] = None,
        num_channels: int = 3,
        image_width: int = 40,
        image_height: int = 40,
        tokenizer: "PreTrainedTokenizerBase" = None,
    ) -> Mapping[str, Any]:
        dummy_inputs = super().generate_dummy_inputs(preprocessor=preprocessor, framework=framework)
        if self._config.type_vocab_size == 0 and "token_type_ids" in dummy_inputs:
            del dummy_inputs["token_type_ids"]
        return dummy_inputs

After running this along with the mentioned steps in the reference (in the OG post), on the export step, I encounter the following error message:

/usr/local/lib/python3.7/dist-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:561: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  q_ids = np.arange(0, query_size)
/usr/local/lib/python3.7/dist-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:561: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  q_ids = np.arange(0, query_size)
/usr/local/lib/python3.7/dist-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:562: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  k_ids = np.arange(0, key_size)
/usr/local/lib/python3.7/dist-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:562: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  k_ids = np.arange(0, key_size)
/usr/local/lib/python3.7/dist-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:566: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  rel_pos_ids = torch.tensor(rel_pos_ids, dtype=torch.long)
/usr/local/lib/python3.7/dist-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:695: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  scale = math.sqrt(query_layer.size(-1) * scale_factor)
/usr/local/lib/python3.7/dist-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:749: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  ).repeat(query_layer.size(0) // self.num_attention_heads, 1, 1)
/usr/local/lib/python3.7/dist-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:751: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  query_layer.size(0) // self.num_attention_heads, 1, 1
/usr/local/lib/python3.7/dist-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:770: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  scale = math.sqrt(pos_key_layer.size(-1) * scale_factor)
/usr/local/lib/python3.7/dist-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:782: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  scale = math.sqrt(pos_query_layer.size(-1) * scale_factor)
/usr/local/lib/python3.7/dist-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:783: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if key_layer.size(-2) != query_layer.size(-2):
/usr/local/lib/python3.7/dist-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:112: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  output = input.masked_fill(rmask, torch.tensor(torch.finfo(input.dtype).min))
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-40-694ff91713ca> in <module>()
      4                                    onnx_config,
      5                                    onnx_config.default_onnx_opset,
----> 6                                    onnx_path)

9 frames
/usr/local/lib/python3.7/dist-packages/transformers/models/deberta_v2/modeling_deberta_v2.py in symbolic(g, self, mask, dim)
    133             to_i=sym_help.cast_pytorch_to_onnx["Byte"],
    134         )
--> 135         output = masked_fill(g, self, r_mask, g.op("Constant", value_t=torch.tensor(torch.finfo(self.dtype).min)))
    136         output = softmax(g, output, dim)
    137         return masked_fill(g, output, r_mask, g.op("Constant", value_t=torch.tensor(0, dtype=torch.uint8)))

AttributeError: 'torch._C.Value' object has no attribute 'dtype'

OOF, this error makes no sense to me, can someone help me out with this? What’s going on? :cry:

UPDATE

This issue has been solved. It’s a version error and probably wouldn’t be encountered by anyone else. The issue was with version 4.20.1 and all you need is to install the newest version from the github, which at this point is 4.21.0.dev0

$ pip install git+https://github.com/huggingface/transformers.git@main

Thanks to ChainYo for the solution. [ref]