Hello,
Sorry if I duplicate the question. I made some brief search in the forum, but did not really found the
answer.
So it was decided to make some fine-tuning of longformer using dataset which consists of 3000 pairs. Length of each input is up to 4096 tokens. After some simple computations understood that it is needed around 24Gb HBM on GPU to run BS=1. I do not have such GPU and I looked on my old 2-socket 20-core Xeon with 64gb of ram.
I installed pytorch optimized by mkldnn for Intel processors… and you know after running I realized that fine-tuning on 3000 pairs will take around 100 hours. 100 hours, Carl! Either this Xeon is too old (only AVX supported) or mkl-dnn does not optimize bert-like pytorch models.
Anyway I’m looking into renting some GPU server. And finally I’m coming to my questions.
Assuming that I need 24gb of memory for 1 batch, then can I take server with 2 GPU with 16 gb each? Do you know if pytorch + cuda can split into 2 GPUs even for batch size = 1 w/o degradation?
Or I need to look for single Nvidia V100 with 32gb of HBM to solve this problem?
Anybody tried already longformer and can share some performance results with details of used HW?
Thanks!!!