BERT inference with Hugging Face Transformers and AWS Inferentia

Hi, I tried to run inference with HuggingFace transformers on inf1.6xlarge instance - following this tutorial - Accelerate BERT inference with Hugging Face Transformers and AWS Inferentia

I installed PyTorch Neuron on AWS as instructed in this link - Get Started with PyTorch Neuron — AWS Neuron Documentation .
version -

transformers             4.12.3
tensorflow               1.15.5
tensorflow-estimator     1.15.1
torch                    1.12.1
torch-neuron             1.12.1.2.7.1.0

When I compile the model with torch.neuron.trace I get the following error -


---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[5], line 2
      1 # compile model with torch.neuron.trace and update config
----> 2 model_neuron = torch.neuron.trace(model, neuron_inputs)
      3 model.config.update({"traced_sequence_length": max_length})

File ~/aws_neuron_venv_pytorch/lib64/python3.8/site-packages/torch_neuron/convert.py:217, in trace(func, example_inputs, fallback, op_whitelist, minimum_segment_size, subgraph_builder_function, subgraph_inputs_pruning, skip_compiler, debug_must_trace, allow_no_ops_on_neuron, compiler_workdir, dynamic_batch_size, compiler_timeout, single_fusion_ratio_threshold, _neuron_trace, compiler_args, optimizations, separate_weights, verbose, **kwargs)
    215     logger.debug("skip_inference_context - trace with fallback at {}".format(get_file_and_line()))
    216     neuron_graph = cu.compile_fused_operators(neuron_graph, **compile_kwargs)
--> 217 cu.stats_post_compiler(neuron_graph)
    219 # Wrap the compiled version of the model in a script module. Note that this is
    220 # necessary for torch==1.8.1 due to the usage of `torch.classes.model.Model`. The
    221 # custom class must be a submodule of the traced graph.
    222 neuron_graph = AwsNeuronGraphModule(neuron_graph)

File ~/aws_neuron_venv_pytorch/lib64/python3.8/site-packages/torch_neuron/convert.py:530, in CompilationUnit.stats_post_compiler(self, neuron_graph)
    526             logger.info(' => {}: {} {}'.format(
    527                 name, remaining_count, supported_string))
    529 if succesful_compilations == 0 and not self.allow_no_ops_on_neuron:
--> 530     raise RuntimeError(
    531         "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!")
    533 if percent_operations_compiled < 50.0:
    534     logger.warning(
    535         "torch.neuron.trace was unable to compile > 50% of the operators in the compiled model!")

RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace!

| grep transformers

Do you know how to address this issue?

Please help me understand what I have missed here. I have also tried transformer versions > 2 and torch-neuron==1.9.1.* versions. I get the same error.

Thank you