Notebook_launcher in diffusers_training_example.ipynb fails with num_processes>=2

Dear HF community:

I try to run diffusers_training_example.ipynb on a subset of the CELEBA-HQ dataset. Specifically,

config.dataset_name = "huggan/CelebA-faces"
dataset = load_dataset(config.dataset_name, split="train")

dataset.set_transform(transform)

from torch.utils.data import Subset
dataset = Subset(dataset, range(5000))

...

## Then only changing num_processes= 1->4
notebook_launcher(train_loop, args, num_processes=4)

And I got an error saying

RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method.

However with num_processes=1 it is good. I used Lambda Cloud GPU and tried on both 8xA100 and 8xTeslaV100 instances but got the same error. My pytorch version is 1.12. I searched a bit and tried a few methods but they didn’t work. `notebook_launcher` fails with `num_processes>=2` · Issue #182 · huggingface/accelerate · GitHub seems similar. Do you know what is going on and can you give me some pointers? Thank you very much!

By the way, regarding training on multiple GPUs, I searched a bit and it seems lambda cloud gpu can do the job, contingency on this issue being solved.

My current knowledge is that Colab Pro and Pro+ provide only 1 GPU although a good GPU. Amazon SageMaker seems pretty complicated, and documentation doesn’t seem sustaining, and the price isn’t competitive than lambda cloud. If you have some tips on where to access multiple GPUs please let me know. Initially I just wanna test water on cloud. Thank you very much.