Hugging Face Forums
torch.distributed.elastic.multiprocessing.errors.ChildFailedError
🤗Transformers
ekjot1999
January 12, 2023, 5:41pm
5
hi
@IdoAmit198
, i’m facing the same issue, have u resolved this issue?
1 Like
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 3 (pid: 10561) of binary
show post in topic
Related Topics
Topic
Replies
Views
Activity
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 3 (pid: 10561) of binary
🤗Accelerate
4
2835
January 24, 2024
KeyError: 'backend' ChildFailedError codeparrot_training.py FAILED
🤗Accelerate
1
415
August 14, 2023
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss
🤗Accelerate
3
1899
January 24, 2024
Dedicated Endpoints error problem
Models
1
107
April 12, 2024
Trainer errors out when concatenating different sequence length batches with distributed training and IterableDataset
🤗Transformers
0
141
October 2, 2023