Deepspeed with Trainer: No. of trainable parameters coming to be 0

With “codegen-2B-multi” model using Deepspeed and gradient checkpointing, Trainer loop gives no. of trainable parameters as 0. This is where the logging happens:

When I am just computing the same without using the Trainer loop (w/o Deepspeed etc.), it is accurately outputting 2B parameters. I am unable to debug this issue, can anyone help in this regard?

I meet the same problem! Could I ask whether you have solved it?