Longformer on 1 GPU or multi-GPU

AndrewGeroge · September 27, 2020, 1:49pm

Hello,

Sorry if I duplicate the question. I made some brief search in the forum, but did not really found the
answer.
So it was decided to make some fine-tuning of longformer using dataset which consists of 3000 pairs. Length of each input is up to 4096 tokens. After some simple computations understood that it is needed around 24Gb HBM on GPU to run BS=1. I do not have such GPU and I looked on my old 2-socket 20-core Xeon with 64gb of ram.
I installed pytorch optimized by mkldnn for Intel processors… and you know after running I realized that fine-tuning on 3000 pairs will take around 100 hours. 100 hours, Carl! Either this Xeon is too old (only AVX supported) or mkl-dnn does not optimize bert-like pytorch models.

Anyway I’m looking into renting some GPU server. And finally I’m coming to my questions.

Assuming that I need 24gb of memory for 1 batch, then can I take server with 2 GPU with 16 gb each? Do you know if pytorch + cuda can split into 2 GPUs even for batch size = 1 w/o degradation?
Or I need to look for single Nvidia V100 with 32gb of HBM to solve this problem?

Anybody tried already longformer and can share some performance results with details of used HW?

Thanks!!!

nlp · May 31, 2022, 10:06am

any progress you noticed on this front , how to enable multi-gpu’s for longformer finetuning. Even though I have 4 gpus enabled , it only takes single gpu during training.

nbroad · May 31, 2022, 1:33pm

I’ve got it to work with longformer. Are you using Trainer or accelerate?

nlp · May 31, 2022, 2:14pm

I am using Trainer

nbroad · May 31, 2022, 6:27pm

What command did you use to launch the training?

nlp · June 1, 2022, 12:28am

I am just using the trainer.train() inside the notebook. The n_gpus=4 in the Training Args, but it only takes first gpu to train.

nbroad · June 3, 2022, 1:02am

You shouldn’t have to specify n_gpus, the Trainer will automatically select all available devices (unless the environment variable CUDA_VISIBLE_DEVICES is set for 1)
Try launching it through a script using this:

python -m torch.distributed.launch \
    --nproc_per_node number_of_gpu_you_have path_to_script.py \
	--all_arguments_of_the_script

Topic		Replies	Views
Training Longformer works on jupyter notebook but not with .py 🤗Transformers	0	89	May 17, 2024
Using multi GPU with Trainer through Deepspeed, parameters found on cpu Beginners	0	1045	August 9, 2023
Multiple GPUs do not speed up the training 🤗Accelerate	1	3447	January 26, 2022
LongFormer - fp16 training without Trainer Models	1	1093	April 27, 2022
Multi GPU Audio Finetuning for Wav2vec2 Failing for 4 GPUs but successful for 1 GPU Beginners	0	307	July 9, 2023

Longformer on 1 GPU or multi-GPU

Related topics