The used dataset had no length, returning gathered tensors. You should drop the remainder yourself

sriramgs · October 18, 2024, 6:48pm

Hi,

When I trained my model with drop_last = True in dataloader, I am getting the below INFO message. Does it cause a concern?

[accelerate.accelerator][INFO] - The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.

When I trained with drop_last = False, I hadn’t got such message.

Please advise.

John6666 · October 19, 2024, 7:41am

This error…I think it’s pretty rare. I don’t see one other case.

github.com

huggingface/accelerate/blob/main/src/accelerate/accelerator.py#L2493


      
              data = gather_object(input_data)
          else:
              data = self.gather(input_data)
          
          try:
              if self.gradient_state.end_of_dataloader:
                  # at the end of a dataloader, `gather_for_metrics` regresses to
                  # `gather` unless the dataset has a remainder so log.
                  if self.gradient_state.remainder == -1:
                      logger.info(
                          "The used dataset had no length, returning gathered tensors. You should drop the remainder yourself."
                      )
                      return data
                  elif self.gradient_state.remainder > 0:
                      # Last batch needs to be truncated on distributed systems as it contains additional samples
                      def _adjust_samples(tensor):
                          return tensor[: self.gradient_state.remainder]
          
                      if use_gather_object:
                          # gather_object put the objects in a list
                          return _adjust_samples(data)

sriramgs · October 19, 2024, 11:19am

Thanks John for pointing the code. I am using gather_from_metrics() in my code. Is there any way to solve this issue? I have added a snippet of my evaluation function.

def evaluate(self, model, criterion, dataloader):
        losses = AverageMeter('loss', ':.4f')
        accuracy= AverageMeter('acc', ':.4f')
        model.eval()
        
        with torch.no_grad():
            for i, batch in enumerate(dataloader):
                img, label = batch[0], batch[1]
                
                y_pred = model(img)

                loss = criterion(y_pred, label.float())
                losses.update(loss.item(), batch[0].size(0))
                
                outputs, targets = accelerate.gather_for_metrics((y_pred, label))
                accuracy.update(binary_accuracy(outputs, targets).item(), batch[0].size(0))

John6666 · October 19, 2024, 12:10pm

Hmmm, it doesn’t look like a problem, so I went back to the library code.
It seems that the error is caused by this being -1.

github.com

huggingface/accelerate/blob/main/src/accelerate/state.py#L1141


      
                  raise AttributeError(f"'AcceleratorState' object has no attribute '{name}'")
          
          
          class GradientState:
              """
              Singleton class that has information related to gradient synchronization for gradient accumulation
          
              **Available attributes:**
          
                  - **end_of_dataloader** (`bool`) -- Whether we have reached the end the current dataloader
                  - **remainder** (`int`) -- The number of extra samples that were added from padding the dataloader
                  - **sync_gradients** (`bool`) -- Whether the gradients should be synced across all devices
                  - **active_dataloader** (`Optional[DataLoader]`) -- The dataloader that is currently being iterated over
                  - **dataloader_references** (`List[Optional[DataLoader]]`) -- A list of references to the dataloaders that are
                      being iterated over
                  - **num_steps** (`int`) -- The number of steps to accumulate over
                  - **adjust_scheduler** (`bool`) -- Whether the scheduler should be adjusted to account for the gradient
                      accumulation
                  - **sync_with_dataloader** (`bool`) -- Whether the gradients should be synced at the end of the dataloader
                      iteration and the number of total steps reset
                  - **is_xla_gradients_synced** (`bool`) -- Whether the XLA gradients have been synchronized. It is initialized

Could it be this? Seems the only way to get around this is manually.

github.com/huggingface/accelerate

`gather_for_metrics` and `GradientState` don't work with arbitrary evaluation intervals

opened 05:27AM - 09 Jan 23 UTC

closed 08:04PM - 07 Mar 23 UTC

ernestchu

bug wip

I try to run evaluation on the evaluation_dataloader (entire eval epoch) every g…iven steps (instead of running after an entire train epoch). After evaluation, the `GradientState` used by `gather_for_metrics` seems to stuck as-is in the last epoch of evaluation, which is ``` Sync Gradients: True At end of current dataloader: True Extra samples added: 20 ``` and it cause `gather_for_metrics` drop samples on train_loader for the following training until it reaches the end of the train epoch. I noticed that the `gradient_state` in `Accelerator` main instance and both `DataLoaderShard` are synced, in a way I can't find in the accelerate source code. Can someone elaborate what the design philosophy of `GradientState` and how it syncs between the accelerator and dataloaders. Additionally, if it is possible to do evaluation in the middle of the train epoch like I mentioned above. Thanks!

Innovator2K · December 26, 2024, 11:11am

I’m having a similar message but without using Accelerate:

2024-12-26 13:10:06,872 - INFO - The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.

Does anyone know how to get rid of this with preserving dataloader_drop_last=True in the TrainingArguments?
The funny thing is that I’m using the HuggingFace’s Dataset class and it has __len__() (link). Moreover, my compute_metrics() and preprocess_logits_for_metrics() both show that the last batch is indeed dropped, so everything seems fine except for the message.

Topic		Replies	Views
Shockingly Incorrect Evaluate Function in Huggingface API 🤗Transformers	1	166	November 2, 2023
`Trainer` seems to drop last incomplete batch even if `Dataloader` is set with drop_last=False Beginners	4	1512	July 27, 2024
Question/Bug about accelerator.gather (how to use accelerate/accelerator.gather for contrastive learning) 🤗Accelerate	1	1285	March 9, 2023
Accelerate device error when running evaluation 🤗Accelerate	0	56	August 12, 2024
Bug on multi-gpu trainer with accelerate 🤗Accelerate	6	525	February 18, 2025

The used dataset had no length, returning gathered tensors. You should drop the remainder yourself

Related topics