Hi, I tried to run inference with HuggingFace transformers on inf1.6xlarge instance - following this tutorial - Accelerate BERT inference with Hugging Face Transformers and AWS Inferentia
I installed PyTorch Neuron on AWS as instructed in this link - Get Started with PyTorch Neuron — AWS Neuron Documentation .
version -
transformers 4.12.3
tensorflow 1.15.5
tensorflow-estimator 1.15.1
torch 1.12.1
torch-neuron 1.12.1.2.7.1.0
When I compile the model with torch.neuron.trace I get the following error -
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[5], line 2
1 # compile model with torch.neuron.trace and update config
----> 2 model_neuron = torch.neuron.trace(model, neuron_inputs)
3 model.config.update({"traced_sequence_length": max_length})
File ~/aws_neuron_venv_pytorch/lib64/python3.8/site-packages/torch_neuron/convert.py:217, in trace(func, example_inputs, fallback, op_whitelist, minimum_segment_size, subgraph_builder_function, subgraph_inputs_pruning, skip_compiler, debug_must_trace, allow_no_ops_on_neuron, compiler_workdir, dynamic_batch_size, compiler_timeout, single_fusion_ratio_threshold, _neuron_trace, compiler_args, optimizations, separate_weights, verbose, **kwargs)
215 logger.debug("skip_inference_context - trace with fallback at {}".format(get_file_and_line()))
216 neuron_graph = cu.compile_fused_operators(neuron_graph, **compile_kwargs)
--> 217 cu.stats_post_compiler(neuron_graph)
219 # Wrap the compiled version of the model in a script module. Note that this is
220 # necessary for torch==1.8.1 due to the usage of `torch.classes.model.Model`. The
221 # custom class must be a submodule of the traced graph.
222 neuron_graph = AwsNeuronGraphModule(neuron_graph)
File ~/aws_neuron_venv_pytorch/lib64/python3.8/site-packages/torch_neuron/convert.py:530, in CompilationUnit.stats_post_compiler(self, neuron_graph)
526 logger.info(' => {}: {} {}'.format(
527 name, remaining_count, supported_string))
529 if succesful_compilations == 0 and not self.allow_no_ops_on_neuron:
--> 530 raise RuntimeError(
531 "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!")
533 if percent_operations_compiled < 50.0:
534 logger.warning(
535 "torch.neuron.trace was unable to compile > 50% of the operators in the compiled model!")
RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace!
| grep transformers
Do you know how to address this issue?
Please help me understand what I have missed here. I have also tried transformer versions > 2 and torch-neuron==1.9.1.* versions. I get the same error.
Thank you