Torchrun uses more vram than running the script with python directly

AngryBacteria · April 8, 2024, 7:01pm

Hei there

I created a small training script using the HuggingFace Transformers Trainer class. It is used to finetune a Mistral 7b model.
Right now I am performing some tests to find out the VRAM usage across different configurations. There is something that I do not understand and I am not sure if this behaviour is normal, most probably I am doing something wrong.

If I run my script the normal python way I have VRAM usage of about 19GB (python test.py)
If I run it with torchrun I have around 25GB (torchrun --nproc_per_node 1 test.py)

Both times there is only 1 GPU being used, which I checked with nvidia-smi. If I run torchrun with 2 GPUs (–nproc_per_node 2) both consume around 25GB, which I think is normal. But why is the VRAM usage so much higher with torchrun even when only using one GPU?

The code is in this github repo. Thank you a lot already for everyone that takes times to help me here

dSiddhesh · May 27, 2024, 3:59pm

Hi there,

We were also facing the same issue.
When executing a script with torchrun on one GPU it is taking 6GB extra GPU space than the normal run.

Please let me know if you find any alternative or solution

Topic		Replies	Views
If I have a small amount of VRAM compared to the model, will pytorch still use the CUDA accelerations? Beginners	0	456	April 21, 2023
Why the memory usage is higher than expected when loading nvidia/NV-Embed-v2 model with FP16 precision? Models	0	95	December 6, 2024
Accelerate on single GPU doesnt seem to work Beginners	2	5519	May 16, 2023
CUDA Out of Memory Error When Training Specific Layers 🤗Transformers	6	380	November 2, 2024
Low GPU utilization with the Decision Transformer Models	6	457	October 30, 2024

Torchrun uses more vram than running the script with python directly

Related topics