DeepSpeed giving Assertion Error

I am facing some issues whe using Deep Speed for fine tuning StarCoder Model. I am exactly following the steps mentioned in this article Creating a Coding Assistant with StarCoder (section Fine-tuning StarCoder with DeepSpeed ZeRO-3). However I am getting the error “AssertionError: Check batch related parameters. train_batch_size is not equal to micro_batch_per_gpu * gradient_acc_step * world_size 256 != 4 * 8 * 1”. I did some research on this on Google and found this link explaining the reason [BUG] batch_size check failed with zero 2 (deepspeed v0.9.0) · Issue #3228 · microsoft/DeepSpeed · GitHub However even if I use the version of deepspeed mentioned in this article as working (v 0.9.0) I am getting the same error. I tried different versions of deepspeed and accelerate but couldn’t fix the issue. Any one has any suggestions? Thanks in advance.

cc @muellerzr this is possibly related to the recent refactoring of Trainer to use accelerate in the backend.

@harik68 I think you should be able to run the StarCoder script by pinning accelerate==0.18.0 and transformers==4.28.1

1 Like

cc @smangrul