The same hyperparameters with deepspeed is worse than without deepseepd

dodge · July 18, 2022, 9:35am

I’m training a model (dpcnn) with batch size 512 with out deepspeed, the actual batch size is equal to (512/8)=64 (I have 8 gpus) with deepspeed, but the deepspeed’s loss and accuracy is far worse than without deepspeed. The two experiments have same code, same hyperparameters except batch size. Any one have some idea to explain this situation?

smangrul · July 18, 2022, 12:55pm

Hello,

We need a small script that we can run to reproduce the behaviour. As per our experiments, it works as expected. For your reference, please go through this blog: Accelerate Large Model Training using DeepSpeed (huggingface.co) for the GLUE task wherein we see at par results with and without DeepSpeed.

mintuos · November 13, 2023, 1:03pm

i have the same quesion with you. in my code, the acc is same with the code without using deepspeed,but the loss is very high.
I wonder if u have solved this question, could u give me some suggest.thanku

Topic		Replies	Views
Is it possible to see what batch size is being used in deepspeed training with auto batch size? 🤗Accelerate	1	593	July 14, 2023
About the DeepSpeed category DeepSpeed	1	792	October 30, 2021
SFTTrainer Doubling Speed on a Single GPU with DeepSpeed: Proposal for an Update to the Official Documentation and Verification Report DeepSpeed	1	62	March 7, 2025
Does Trainer hyperparameter search support deepspeed? Beginners	0	215	July 10, 2023
Incorrect total train batch size when using tp_size > 1 and deepspeed DeepSpeed	1	54	May 20, 2025

The same hyperparameters with deepspeed is worse than without deepseepd

Related topics