Code RuntimeError

#The problem has been solved export NCCL_ P2P_ DISABLE=1, I found that after loading one piece of data, an error will be reported and I have set the export NCCL_ P2P_ DISABLE=1,but I have the new problem.
warnings.warn(
number of problems for this task is 164
1%|â–‹ | 1/82 [00:54<1:13:20, 54.33s/it]
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 61144 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 1 (pid: 61145) of binary: /root/anaconda/envs/bigcode1/bin/python
Traceback (most recent call last):
File “/root/anaconda/envs/bigcode1/bin/accelerate”, line 8, in
sys.exit(main())
File “/root/anaconda/envs/bigcode1/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py”, line 47, in main
args.func(args)
File “/root/anaconda/envs/bigcode1/lib/python3.8/site-packages/accelerate/commands/launch.py”, line 977, in launch_command
multi_gpu_launcher(args)
File “/root/anaconda/envs/bigcode1/lib/python3.8/site-packages/accelerate/commands/launch.py”, line 646, in multi_gpu_launcher
distrib_run.run(args)
File “/root/anaconda/envs/bigcode1/lib/python3.8/site-packages/torch/distributed/run.py”, line 753, in run
elastic_launch(
File “/root/anaconda/envs/bigcode1/lib/python3.8/site-packages/torch/distributed/launcher/api.py”, line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File “/root/anaconda/envs/bigcode1/lib/python3.8/site-packages/torch/distributed/launcher/api.py”, line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

01.py FAILED

Failures:

<NO_OTHER_FAILURES>

Root Cause (first observed failure):

[0]:
time : 2023-10-22_12:57:04
host : rt-res-public9-6f8f8bd4fc-92zc9
rank : 1 (local_rank: 1)
exitcode : -6 (pid: 61145)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 61145