Hi. I am trying to run a dreambooth training space on a rented A10G. Once started running it doesn’t stop. I have run it twice, waiting each time for almost an hour and then factory rebooted the space. The last few lines in the logs read:
RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
But each time the app says that the training is still running (and probably still charging me). Any advice?