Heyho guys,
Iâm already sorry for wasting your valuable time but I have a problem following the dreambooth tutorial (DreamBooth)
So I followed every step and used âaccelerate config defaultâ
but I get following error message while trying to run the part, where you use the train_dreambooth_flux.py with âaccelerate launch train_dreambooth_flux.py --XXXâ
with XXX representing all the recommended arguments as well.
Traceback (most recent call last):
File "/home/tim/miniconda3/envs/flux/bin/accelerate", line 11, in <module>
sys.exit(main())
^^^^^^
File "/home/tim/miniconda3/envs/flux/lib/python3.11/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/home/tim/miniconda3/envs/flux/lib/python3.11/site-packages/accelerate/commands/launch.py", line 1159, in launch_command
multi_gpu_launcher(args)
File "/home/tim/miniconda3/envs/flux/lib/python3.11/site-packages/accelerate/commands/launch.py", line 793, in multi_gpu_launcher
distrib_run.run(args)
File "/home/tim/miniconda3/envs/flux/lib/python3.11/site-packages/torch/distributed/run.py", line 910, in run
elastic_launch(
File "/home/tim/miniconda3/envs/flux/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tim/miniconda3/envs/flux/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
=======================================================
train_dreambooth_flux.py FAILED
-------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
-------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-12-15_15:02:23
host : chariot
rank : 1 (local_rank: 1)
exitcode : -9 (pid: 713731)
error_file: <N/A>
traceback : Signal 9 (SIGKILL) received by PID 713731
=======================================================
Is there any possibility of help?
Iâm using a machine with 2 NVIDIAS GeForce RTX4090 (24GB)
I donât have problems with running Diffusers-Models, so itâs probably not a package-dependent problem.
I donât want a full solution, just maybe a quick tip, where I can go from here on with this error-message?
For example would it be recommended to activate the logging-function and looking there for more specific feedback? Not sure if I would understand anythingâŚ
Thanks in advance and sorry for bad english,
Cheers