Cuda memory imbalanced in multiple GPUs, and multiple process in device 0

Hello, I am using v100 * 8 to finetune the deberta-v3-large model for sequence classification, I use transformers and Trainer API to train after using accelerate config to specify a deepspeed config, I use accelerate launch main.py to start training, but the process was not initialized properly, and the cuda memory is not balanced at all.


Here is my config file

It is pretty weird, I just use a huggingface trainer API to code