Out of Memory error with multi-gpu training but no error with just one gpu?

I’m using the instance g5.24xlarge and using this code to fine-tune stable diffusion. If I train with just one GPU the script runs fine, but when using accelerate config with multi-gpu the script runs out of memory. Transformers version is 4.36.0.dev0, has anybody knows what it could be?

The only error I can see is this other:

Found unsupported HuggingFace version 4.36.0.dev0 for automated tensor parallelism. HuggingFace modules will not be automatically distributed. You can use smp.tp_register_with_module API to register desired modules for tensor parallelism, or directly instantiate an smp.nn.DistributedModule. Supported HuggingFace transformers versions for automated tensor parallelism: [‘4.17.0’, ‘4.20.1’, ‘4.21.0’]

I have tried installing those other versions but it appears those versions have some errors with CLIP