Deepspeed with Trainer: No. of trainable parameters coming to be 0

ayush1801 · December 3, 2022, 3:27pm

With “codegen-2B-multi” model using Deepspeed and gradient checkpointing, Trainer loop gives no. of trainable parameters as 0. This is where the logging happens:

github.com

huggingface/transformers/blob/94b3f544a1f5e04b78d87a2ae32a7ac252e22e31/src/transformers/trainer.py#L1616


      
          
          
# Train!
          logger.info("***** Running training *****")
          logger.info(f"  Num examples = {num_examples}")
          logger.info(f"  Num Epochs = {num_train_epochs}")
          logger.info(f"  Instantaneous batch size per device = {args.per_device_train_batch_size}")
          logger.info(f"  Total train batch size (w. parallel, distributed & accumulation) = {total_train_batch_size}")
          logger.info(f"  Gradient Accumulation steps = {args.gradient_accumulation_steps}")
          logger.info(f"  Total optimization steps = {max_steps}")
          logger.info(
              f"  Number of trainable parameters = {sum(p.numel() for p in model.parameters() if p.requires_grad)}"
          )
          
          
self.state.epoch = 0
          start_time = time.time()
          epochs_trained = 0
          steps_trained_in_current_epoch = 0
          steps_trained_progress_bar = None
          
          
# Check if continuing training from a checkpoint
          if resume_from_checkpoint is not None and os.path.isfile(

When I am just computing the same without using the Trainer loop (w/o Deepspeed etc.), it is accurately outputting 2B parameters. I am unable to debug this issue, can anyone help in this regard?

ZongqianLi · December 22, 2022, 2:54pm

I meet the same problem! Could I ask whether you have solved it?

czsun · March 22, 2023, 7:42am

logger.info(f" Number of trainable parameters = {sum(p.numel() + p.ds_numel for p in model.parameters() if p.requires_grad)}"
)

bingwork · November 29, 2023, 3:39am

but I got an error p does not have attribute ds_numel.

HandsomeWu666 · November 21, 2024, 1:54am

same problem. is there any solution？

Topic		Replies	Views
Optimizer got an empty parameter list when using deepspeed Beginners	0	881	October 29, 2021
Using multi GPU with Trainer through Deepspeed, parameters found on cpu Beginners	0	1050	August 9, 2023
Question about using trainer with DeepSpeed 🤗Transformers	0	454	April 25, 2023
Deepspeed and Trainer does not exit after training is completed Beginners	1	204	July 30, 2024
Incorrect total train batch size when using tp_size > 1 and deepspeed DeepSpeed	1	65	May 20, 2025

Deepspeed with Trainer: No. of trainable parameters coming to be 0

Related topics