Computer turns off without a warning

stelman · June 26, 2023, 2:17pm

Hello,

I am trying to finetune Flan-T5 on my desktop computer, which has two Titan X GPUs.

Tensorflow seems to work fine, up to Flan-t5-base. It runs out of memory when using the large model.

However, I am getting some very weird results with the pytorch version of the model. The computer just shuts down when the network does not fit into the memory.

I used the command nvidia-smi -pl 150 reduce the power level. It seems to work when I first set up an environment. If I try to run the same script again, the computer shuts down once more!

However, if I set up a new environment from scratch it’s all good.

I don’t understand why this is happening. TensorFlow simply throughs an OOM error if there’s a problem. It’s only pytorch related.

Topic		Replies	Views
Struggle with finetuneing flan-t5-xxl using deepspeed DeepSpeed	3	848	March 12, 2024
Trainer freezes after all steps are complete (multi-gpu setting) 🤗Transformers	4	1538	February 14, 2024
Impossible to use flan-t5-xxl in a batch-transform job Amazon SageMaker	3	1148	May 23, 2023
Training stops/crashes with no trace Beginners	4	1326	November 15, 2023
[Diffusers] PyTorch running out of memory 🧨 Diffusers	1	774	August 30, 2022

Computer turns off without a warning

Related topics