Manual splitting of model across multi-GPU setup

Dalhimar · December 28, 2023, 9:12am

I have been doing some testing with training Lora’s and have a question that I don’t see an answer for.
Here is my hardware setup:
Intel 3435X
128GB DDR5 in 8 channel
2x3090 FE cards with NVlink
Dual boot Ubuntu/Windows

I use Ubuntu as my Dev and training setup. I am using Oobabooga Text gen webui as a GUI and the training pro extension.
I am running test with training Xwin 70B via transformers by using the following flags

--load-in-4bit
--use_double_quant
--auto-devices

I can train the model at rank 64, alpha 128, max context length 45, batch size 1, gradient accumulation 5. I use target projections q-k-v, and NEFnoise scale 2.

I can get decent results from these settings, but I would like more wiggle room to experiment. This is where my question comes in.

Edit to fix VRAM amounts.

During initial loading, both GPU’s are loaded up fine. Before training they’re sitting about 17.3GB on GPU 0, 20.3GB on GPU 1. Once I start training, the values go up to 21.25GB on GPU 0, 24.178GB on GPU 1. If I try to adjust much of anything I encounter the Cuda OOM errors. I am pretty sure that this is happening because of the memory load imbalance that happens once training starts, as it tries to overfill GPU 1.

Is there a way that I can manually control the layer split between GPUs? Right now it appears as though it is using (# of layers)/(# of GPUs) to try and split the model evenly between the GPUs, without accounting for the overhead of the various code libraries that also have to be loaded to GPU. I’d like to be able to offload more layers to GPU 0 in order to take advantage of my unused VRAM.

If needed, I’m willing to try altering some of the python code used by the transformers package installed by Oobabooga.

nielsr · December 29, 2023, 4:32pm

Hi,

Yes you can manually edit the device_map, used to place the model on the available devices.

See this page for more info: Handling big models for inference

Topic		Replies	Views
Unable to train Bert by splitting across GPUs 🤗Transformers	0	456	June 24, 2022
Increasing VRAM Usage with Transformers Trainer Leads to OOM on GPUs 🤗Transformers	2	1047	March 29, 2024
LLaMA2 7B uses > 128 GB of GPU Ram and fails with OOM or Loss Scale Minimum 🤗Transformers	3	5557	August 17, 2023
Model Parallism DeepSpeed	0	184	April 21, 2024
Question about FP16/32, LoRA and GPU Memory Usage 🤗Transformers	1	3753	September 18, 2023

Manual splitting of model across multi-GPU setup

Related topics