Dear HF community:
I try to run diffusers_training_example.ipynb on a subset of the CELEBA-HQ dataset. Specifically,
config.dataset_name = "huggan/CelebA-faces" dataset = load_dataset(config.dataset_name, split="train") dataset.set_transform(transform) from torch.utils.data import Subset dataset = Subset(dataset, range(5000)) ... ## Then only changing num_processes= 1->4 notebook_launcher(train_loop, args, num_processes=4)
And I got an error saying
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method.
num_processes=1 it is good. I used Lambda Cloud GPU and tried on both 8xA100 and 8xTeslaV100 instances but got the same error. My pytorch version is 1.12. I searched a bit and tried a few methods but they didn’t work. `notebook_launcher` fails with `num_processes>=2` · Issue #182 · huggingface/accelerate · GitHub seems similar. Do you know what is going on and can you give me some pointers? Thank you very much!