Huggingface on Databricks

I have a multiple GPU cluster (driver + 2x worker; each with 2 GPUs i.e. a total of 6 GPUs) setup on Databricks. I want to have distributed training and inference running on this cluster. Using the distributed modules, I am able to leverage only the GPUs on the driver node.

Is there any way I can make use of all the 6 GPUs (I don’t have terminal access to the cluster)?

Thanks in advance.

1 Like