Does anyone have any experience with this? I want to launch a hyper-parameter tuning job on multiple nodes, each node has 8 GPUs. The resource for each trial is {“gpu”:8}
Does anyone have any experience with this? I want to launch a hyper-parameter tuning job on multiple nodes, each node has 8 GPUs. The resource for each trial is {“gpu”:8}