[SOLVED] accelerate.Accelerator(): CUDA error: invalid device ordinal

Thanks @muellerzr . Will do. I was finally able to get a “good node” that would run without this error by just adding to the SLURM --exclude list, and after about 6 tries it worked.

I’ll open the issue to see if we can maybe figure out what distinguishes a “good node” from a “bad node”