Hi! I’m trying to fine-tune microsoft/deberta-base and I get the following error: Can someone please help me?
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: ***************************************************************
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: An Internal Compiler Error has occurred
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: ***************************************************************
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]:
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: Error message: Too many instructions after unroll!
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]:
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: Error class: AssertionError
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: Error location: Unknown
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: Command line: /usr/local/bin/neuronx-cc --target=trn1 compile --framework XLA /tmp/MODULE_1_SyncTensorsGraph.28259_3249576211467925980_ip-172-16-2-97-8d78ed13-1543947-5fb53a1d79c9f.hlo.pb --output /var/tmp/neuron-compile-cache/USER_neuroncc-2.3.0.4+864822b6b/MODULE_3249576211467925980/MODULE_1_SyncTensorsGraph.28259_3249576211467925980_ip-172-16-2-97-8d78ed13-1543947-5fb53a1d79c9f/6efddf87-ab9b-4cd0-872a-783c8982367f/MODULE_1_SyncTensorsGraph.28259_3249576211467925980_ip-172-16-2-97-8d78ed13-1543947-5fb53a1d79c9f.neff --verbose=35
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]:
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: Internal details:
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/driver/CommandDriver.py", line 233, in neuronxcc.driver.CommandDriver.CommandDriver.run
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/driver/commands/CompileCommand.py", line 1011, in neuronxcc.driver.commands.CompileCommand.CompileCommand.run
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/driver/commands/CompileCommand.py", line 962, in neuronxcc.driver.commands.CompileCommand.CompileCommand.runPipeline
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/driver/commands/CompileCommand.py", line 987, in neuronxcc.driver.commands.CompileCommand.CompileCommand.runPipeline
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/driver/commands/CompileCommand.py", line 991, in neuronxcc.driver.commands.CompileCommand.CompileCommand.runPipeline
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/driver/Job.py", line 297, in neuronxcc.driver.Job.SingleInputJob.run
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/driver/Job.py", line 323, in neuronxcc.driver.Job.SingleInputJob.runOnState
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/driver/Pipeline.py", line 30, in neuronxcc.driver.Pipeline.Pipeline.runSingleInput
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/driver/Job.py", line 297, in neuronxcc.driver.Job.SingleInputJob.run
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/driver/Job.py", line 323, in neuronxcc.driver.Job.SingleInputJob.runOnState
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/driver/jobs/Frontend.py", line 583, in neuronxcc.driver.jobs.Frontend.Frontend.runSingleInput
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/driver/jobs/Frontend.py", line 379, in neuronxcc.driver.jobs.Frontend.Frontend.runXLAFrontend
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/starfish/penguin/Frontend.py", line 168, in neuronxcc.starfish.penguin.Frontend.tensorizeXla
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/starfish/penguin/Frontend.py", line 243, in neuronxcc.starfish.penguin.Frontend.tensorizeXlaImpl
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/starfish/penguin/Frontend.py", line 244, in neuronxcc.starfish.penguin.Frontend.tensorizeXlaImpl
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/starfish/penguin/Frontend.py", line 266, in neuronxcc.starfish.penguin.Frontend.tensorizeXlaImpl
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/starfish/penguin/Compile.py", line 162, in neuronxcc.starfish.penguin.Compile.compile_cu
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/starfish/penguin/Compile.py", line 164, in neuronxcc.starfish.penguin.Compile.compile_cu
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/starfish/penguin/Compile.py", line 196, in neuronxcc.starfish.penguin.Compile.compile_cu
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 432, in neuronxcc.starfish.penguin.DotTransform.PassManager.transformFunction
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 440, in neuronxcc.starfish.penguin.DotTransform.PassManager.transformFunction
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 142, in neuronxcc.starfish.penguin.DotTransform.DotTransform.runOnFunction
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 196, in neuronxcc.starfish.penguin.DotTransform.DotTransform.run_with_exception_handling
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 178, in neuronxcc.starfish.penguin.DotTransform.DotTransform.run_with_exception_handling
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 208, in neuronxcc.starfish.penguin.DotTransform.DotTransform.timed_run_
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 210, in neuronxcc.starfish.penguin.DotTransform.DotTransform.timed_run_
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 211, in neuronxcc.starfish.penguin.DotTransform.DotTransform.timed_run_
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 240, in neuronxcc.starfish.penguin.DotTransform.DotTransform.run_
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 241, in neuronxcc.starfish.penguin.DotTransform.DotTransform.run_
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 334, in neuronxcc.starfish.penguin.DotTransform.DotTransform.transformFunction
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 330, in neuronxcc.starfish.penguin.DotTransform.DotTransform.runTransforms
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: File "neuronxcc/starfish/penguin/targets/sunda/passes/SundaSizeTiling.py", line 2094, in neuronxcc.starfish.penguin.targets.sunda.passes.SundaSizeTiling.SundaSizeTiling.afterStmtTransform
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]:
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: Version information:
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: NeuronX Compiler version 2.3.0.4+864822b6b
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]:
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: HWM version 2.3.0.0-de45371a7
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: NEFF version Dynamic
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: TVM not available
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: NumPy version 1.20.3
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: MXNet not available
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]:
05/10/2023 09:40:25 AM ERROR 1544620 [neuronx-cc]: Artifacts stored in: /home/ubuntu/ssl/Notebooks/neuronxcc-l7smglxd
Traceback (most recent call last):
File "/usr/local/bin/neuronx-cc", line 8, in <module>
sys.exit(main())
File "neuronxcc/driver/CommandDriver.py", line 272, in neuronxcc.driver.CommandDriver.main
File "neuronxcc/driver/CommandDriver.py", line 239, in neuronxcc.driver.CommandDriver.CommandDriver.run
UnboundLocalError: local variable 'states' referenced before assignment
....
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 27, in poll
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
2023-05-10 09:41:40.000053: INFO ||NCC_WRAPPER||: Keyboard interrupt, exiting compilation
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "neuronxcc/driver/CommandDriver.py", line 190, in neuronxcc.driver.CommandDriver.CommandDriver.print_dots
KeyboardInterrupt
2023-05-10 09:41:40.388399: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at tpu_execute_op.cc:266 : INTERNAL: neuronx-cc compilation failed.
2023-05-10 09:41:40.602841: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] StackTrace:
2023-05-10 09:41:40.602883: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] *** Begin stack trace ***
2023-05-10 09:41:40.602888: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] tensorflow::CurrentStackTrace()
2023-05-10 09:41:40.602892: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] xla::util::ReportComputationError(tensorflow::Status const&, absl::lts_20211102::Span<xla::XlaComputation const* const>, absl::lts_20211102::Span<xla::Shape const* const>)
2023-05-10 09:41:40.602896: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] xla::XrtComputationClient::ExecuteComputation(xla::ComputationClient::Computation const&, absl::lts_20211102::Span<std::shared_ptr<xla::ComputationClient::Data> const>, std::string const&, xla::ComputationClient::ExecuteComputationOptions const&)
2023-05-10 09:41:40.602900: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]
2023-05-10 09:41:40.602904: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] xla::util::MultiWait::Complete(std::function<void ()> const&)
2023-05-10 09:41:40.602907: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]
2023-05-10 09:41:40.602910: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]
2023-05-10 09:41:40.602914: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]
2023-05-10 09:41:40.602917: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] clone
2023-05-10 09:41:40.602921: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] *** End stack trace ***
2023-05-10 09:41:40.602924: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]
2023-05-10 09:41:40.602928: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] Status: INTERNAL: From /job:localservice/replica:0/task:0:
2023-05-10 09:41:40.602931: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] 2 root error(s) found.
2023-05-10 09:41:40.602935: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] (0) INTERNAL: neuronx-cc compilation failed.
2023-05-10 09:41:40.602938: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] [[{{node XRTExecute}}]]
2023-05-10 09:41:40.602942: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] [[XRTExecute_G15]]
2023-05-10 09:41:40.602945: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] (1) INTERNAL: neuronx-cc compilation failed.
2023-05-10 09:41:40.602962: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] [[{{node XRTExecute}}]]
2023-05-10 09:41:40.602966: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] 0 successful operations.
2023-05-10 09:41:40.602970: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] 0 derived errors ignored.
2023-05-10 09:41:40.602973: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] Recent warning and error logs:
2023-05-10 09:41:40.602977: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] OP_REQUIRES failed at tpu_execute_op.cc:266 : INTERNAL: neuronx-cc compilation failed.