Hello @aps, yes, you are correct. The logic behind the current setup is that the conventional training would involve preparing dataloaders and we fill relevant DeepSpeed config params from it. As for the use case you have described, the current workaround would be to pass a dummy dataloader with batch_size
filled in which should mimic just passing the batch_size
arg directly to prepare
call.
A cleaner approach would be to skip this part if train_micro_batch_size_per_gpu
is provided in config_file
when using DEEPSPEED_CONFIG_FILE
support. Let me know if that would solve the issue. If so, please raise a feature Request on repo.