LM Example run_mlm.py. The memory usage of each GPU is inconsistent

When I use run_mlm.py.
The memory usage of GPU number 0 is twice that of other GPUs.
Is this normal?
Thanks for any advice!

May be I used fp16?

I would also like to know what to do about this.