SDXL Finetuning Script Not Working

sr5434 · December 10, 2023, 9:43pm

When using the script to run a full fine-tuning of Stable Diffusion XL in the examples folder, there is an error:

Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1013, in launch_command
    tpu_launcher(args)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 756, in tpu_launcher
    xmp.spawn(PrepareForLaunch(main_function), args=(), nprocs=args.num_processes)
  File "/usr/local/lib/python3.10/dist-packages/torch_xla/runtime.py", line 82, in wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 38, in spawn
    return pjrt.spawn(fn, nprocs, start_method, args)
  File "/usr/local/lib/python3.10/dist-packages/torch_xla/_internal/pjrt.py", line 198, in spawn
    return _run_singleprocess(spawn_fn)
  File "/usr/local/lib/python3.10/dist-packages/torch_xla/runtime.py", line 82, in wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch_xla/_internal/pjrt.py", line 102, in _run_singleprocess
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch_xla/_internal/pjrt.py", line 178, in __call__
    self.fn(runtime.global_ordinal(), *self.args, **self.kwargs)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/launch.py", line 562, in __call__
    self.launcher(*args)
TypeError: main() missing 1 required positional argument: 'args'

It appears that this error is caused because Huggingface Accelerate is not passing the arguments into the training function. How do I fix this? Below is my config(I’m using a Google Colab with a TPU:

compute_environment: LOCAL_MACHINE
debug: false
distributed_type: TPU
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: 'no'
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

sr5434 · December 10, 2023, 9:49pm

It turns out that the training script was the problem. I will create a PR to patch the issue shortly.

Topic		Replies	Views
Tiny fine tune messed up the pretrained sd v1-4 model 🧨 Diffusers	2	504	April 8, 2023
Optimizing my SDXL pipeline 🧨 Diffusers	0	544	May 29, 2024
Weight and shape different than the number of channels in input Intermediate	0	276	April 4, 2024
Error in finetuning File "/app/env/bin/accelerate" 🤗AutoTrain	1	799	December 21, 2023
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 3 (pid: 10561) of binary 🤗Accelerate	4	4835	January 24, 2024

SDXL Finetuning Script Not Working

Related topics