When using the script to run a full fine-tuning of Stable Diffusion XL in the examples folder, there is an error:
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1013, in launch_command
tpu_launcher(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 756, in tpu_launcher
xmp.spawn(PrepareForLaunch(main_function), args=(), nprocs=args.num_processes)
File "/usr/local/lib/python3.10/dist-packages/torch_xla/runtime.py", line 82, in wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 38, in spawn
return pjrt.spawn(fn, nprocs, start_method, args)
File "/usr/local/lib/python3.10/dist-packages/torch_xla/_internal/pjrt.py", line 198, in spawn
return _run_singleprocess(spawn_fn)
File "/usr/local/lib/python3.10/dist-packages/torch_xla/runtime.py", line 82, in wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch_xla/_internal/pjrt.py", line 102, in _run_singleprocess
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch_xla/_internal/pjrt.py", line 178, in __call__
self.fn(runtime.global_ordinal(), *self.args, **self.kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/launch.py", line 562, in __call__
self.launcher(*args)
TypeError: main() missing 1 required positional argument: 'args'
It appears that this error is caused because Huggingface Accelerate is not passing the arguments into the training function. How do I fix this? Below is my config(I’m using a Google Colab with a TPU:
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: TPU
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: 'no'
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false