Hello, I am using v100 * 8 to finetune the deberta-v3-large model for sequence classification, I use transformers and Trainer API to train after using accelerate config
to specify a deepspeed config, I use accelerate launch main.py
to start training, but the process was not initialized properly, and the cuda memory is not balanced at all.
Here is my config file
It is pretty weird, I just use a huggingface trainer API to code