Hi, I am trying to use Accelerate with multi-gpu on a single machine with a Weights and Biases sweep but I could not find any documentation specifically about this topic.
I tried to accomplish this with the following approach but I am getting errors:
In the main function, the accelerator is initialized as follows and the model parameters are taken from WANDB config. Then, the WAND config is setup and initialized as follows:
def main() -> None:
# Create accelerator for distributed training and logging
accelerator = Accelerator(
split_batches=True,
mixed_precision="fp16",
log_with="wandb",
)
# Initialize wandb run
if accelerator.is_main_process:
accelerator.init_trackers(
project_name=WANDB_PROJECT,
init_kwargs={
"wandb": {
"entity": WANDB_ENTITY,
"dir": WANDB_EXPERIMENT_DIR,
}
},
)
# Log configuration from wandb tracker
learning_rate = wandb.config.learning_rate
base_model_name = wandb.config.base_model_name
resize_resolution = wandb.config.resize_resolution
model_type = MODEL_TYPES[base_model_name]
if __name__ == "__main__":
# Define sweep config
SWEEP_CONFIG = {
"method": "random",
"early_terminate": {
"type": "hyperband",
"min_iter": 3,
},
"name": "sweep",
"metric": {"goal": "maximize", "name": "Valid Acc"},
"parameters": {
# Log-uniform requires min/max values specified as base-e exponents
"learning_rate": {
"values": [
1e-4,
1e-3,
1e-2,
]
},
"base_model_name": {
"values": [
"facebook/convnext-tiny-224",
"facebook/convnext-small-224",
"microsoft/swinv2-tiny-patch4-window8-256",
]
},
"resize_resolution": {"values": ["model_base", "sd"]},
},
}
# Initialize sweep by passing in config
sweep_id = wandb.sweep(
sweep=SWEEP_CONFIG,
project=WANDB_PROJECT,
entity=WANDB_ENTITY,
)
# Start sweep job.
wandb.agent(sweep_id, function=main, count=1)
Later, from the CLI I configure the accelerator parameters with âaccelerate configâ and run the following command: accelerate launch
I am getting the following output and error:
Create sweep with ID: z18ok842
Sweep URL:
Create sweep with ID: 350at2y9
Sweep URL:
Create sweep with ID: sca3d4qa
Sweep URL:
wandb: Agent Starting Run: 1oaggrmg with config:
wandb: base_model_name: facebook/convnext-tiny-224
wandb: learning_rate: 0.01
wandb: resize_resolution: sd
wandb: Agent Starting Run: y3o3i8tn with config:
wandb: base_model_name: facebook/convnext-small-224
wandb: learning_rate: 0.001
wandb: resize_resolution: sd
wandb: Agent Starting Run: vdcqv0d4 with config:
wandb: base_model_name: facebook/convnext-small-224
wandb: learning_rate: 0.01
wandb: resize_resolution: model_base
Create sweep with ID: vp6bdpbb
Sweep URL:
wandb: Agent Starting Run: xw1qzfpw with config:
wandb: base_model_name: facebook/convnext-tiny-224
wandb: learning_rate: 0.01
wandb: resize_resolution: model_base
Run xw1qzfpw errored: Error(âYou must call wandb.init() before wandb.config.learning_rateâ)
wandb: ERROR Run xw1qzfpw errored: Error(âYou must call wandb.init() before wandb.config.learning_rateâ)
Run y3o3i8tn errored: Error(âYou must call wandb.init() before wandb.config.learning_rateâ)
wandb: ERROR Run y3o3i8tn errored: Error(âYou must call wandb.init() before wandb.config.learning_rateâ)
Run vdcqv0d4 errored: Error(âYou must call wandb.init() before wandb.config.learning_rateâ)
wandb: ERROR Run vdcqv0d4 errored: Error(âYou must call wandb.init() before wandb.config.learning_rateâ)
Do you have a recommendation on how to integrate accelerate with wandb sweep?
Thank you.