Computer turns off without a warning


I am trying to finetune Flan-T5 on my desktop computer, which has two Titan X GPUs.

Tensorflow seems to work fine, up to Flan-t5-base. It runs out of memory when using the large model.

However, I am getting some very weird results with the pytorch version of the model. The computer just shuts down when the network does not fit into the memory.

I used the command nvidia-smi -pl 150 reduce the power level. It seems to work when I first set up an environment. If I try to run the same script again, the computer shuts down once more!

However, if I set up a new environment from scratch it’s all good.

I don’t understand why this is happening. TensorFlow simply throughs an OOM error if there’s a problem. It’s only pytorch related.