I’m using run_mlm.py to train a custom BERT model. I have two GPUs available. One has 24GB of memory and the other has 11 GB of memory. I want to use the batch size of 64 for the larger GPU and the batch size of 16 for the smaller GPU. How can I do so? The
--per_device_train_batch_size parameter only takes one number. Or can I just give the combined batch size (80) and let the script figure out how to split the data between GPUs?