Deepspeed with Trainer: No. of trainable parameters coming to be 0

With “codegen-2B-multi” model using Deepspeed and gradient checkpointing, Trainer loop gives no. of trainable parameters as 0. This is where the logging happens:

When I am just computing the same without using the Trainer loop (w/o Deepspeed etc.), it is accurately outputting 2B parameters. I am unable to debug this issue, can anyone help in this regard?

1 Like

I meet the same problem! Could I ask whether you have solved it?

1 Like" Number of trainable parameters = {sum(p.numel() + p.ds_numel for p in model.parameters() if p.requires_grad)}"