I’m looking to fine-tune allenai/led-base-16384 on a private dataset with tokens of length 16824 using 4 NVDIA Tesla V100S GPUS (each with 32GB RAM). Using DeepSpeed Zero3 and NVME parameter and optimiser offloading get OOM errors whenever I increase max_source_length
to more than 2048.
I’m not sure why this is happening and wanted to better understand how I could predict the memory requirements of fine-tuning LED on inputs with a length of 16824. Running the deepspeed estimator suggests that we don’t need much GPU RAM memory with NVME offloading yet that doesn’t seem to be the case in practice. Could this be because the memory predictor assumes a length of 1024 not 16824?
HW: Setup with 1 node, 4 GPUs per node.
SW: Model with 459M total params, 51M largest layer params.
per CPU | per GPU | Options
11.56GB | 0.19GB | offload_param=cpu , offload_optimizer=cpu , zero_init=1
11.56GB | 0.19GB | offload_param=cpu , offload_optimizer=cpu , zero_init=0
10.28GB | 0.41GB | offload_param=none, offload_optimizer=cpu , zero_init=1
10.28GB | 0.41GB | offload_param=none, offload_optimizer=cpu , zero_init=0
1.15GB | 2.12GB | offload_param=none, offload_optimizer=none, zero_init=1
10.28GB | 2.12GB | offload_param=none, offload_optimizer=none, zero_init=0