When using DeepSpeed why do I need to pass dataloaders to the `accelerator.prepare`?

If we don’t pass dataloaders to accelerator.prepare, we get an error from this line: You must specify a training or evaluation dataloader in accelerate.prepare() when using DeepSpeed.

Based on my limited understanding from reading the code, it looks like the dataloader is needed to figure out the batch size per device. I didn’t see any other direct relation of DeepSpeed w.r.t. to preparing the dataloader.

In my case, I don’t want accelerate to prepare the dataloader for me as I am handing distributed process and per worker sharding myself. Would it be reasonable to adjust prepare to accept an optional batch size argument directly so as not to enforce the passing of the dataloader to prepare or maybe I am missing a deeper detail in the code/logic?

Hello @aps, yes, you are correct. The logic behind the current setup is that the conventional training would involve preparing dataloaders and we fill relevant DeepSpeed config params from it. As for the use case you have described, the current workaround would be to pass a dummy dataloader with batch_size filled in which should mimic just passing the batch_size arg directly to prepare call.

A cleaner approach would be to skip this part if train_micro_batch_size_per_gpu is provided in config_file when using DEEPSPEED_CONFIG_FILE support. Let me know if that would solve the issue. If so, please raise a feature Request on repo.


PR /deepspeed enhancements and fixes by pacman100 · Pull Request #676 · huggingface/accelerate (github.com) skips this dataloader requirement when train_micro_batch_size_per_gpu is specified in the deepspeed config file when using config_file support.