There seems to be not a single sample in your epoch_iterator, stopping training at step 0! This is expected if you're using an IterableDataset and set num_steps (5000000) higher than the number of available samples

Hello, I’m using Trainer API, and I got this error:

***** Running training *****
  Num examples = 80000000
  Num Epochs = 9223372036854775807
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 5000000

  0%|          | 0/5000000 [00:00<?, ?it/s]

There seems to be not a single sample in your epoch_iterator, stopping training at step 0! 
This is expected if you're using an IterableDataset and set num_steps (5000000) higher than the number of available samples.

My TrainerArgument is like

TrainingArguments(
        output_dir="./tmp",
        overwrite_output_dir=True,
        local_rank=args.local_rank,
        learning_rate=0.00025,
        per_device_train_batch_size= 8, # batch size for training
        per_device_eval_batch_size=8,
        
        save_steps=10000,  # after # steps model is saved
        warmup_steps=2000,  # number of warmup steps for learning rate scheduler
        max_steps=5000000, 
        fp16=False,
        fp16_opt_level='01',
        sharded_ddp='zero_dp_3 auto_wrap',
        dataloader_num_workers=8,

The error says that there are too many steps, but in my iterabledataset, I didn’t set len(). I don’t know where does the Trainer get the number of samples, anyone has idea? Thanks.

Hey, did you maybe able to solve it or figure what was the problem causing this issue?
I’m facing the same issue here.

I also ran into this. Turns out I had a filter that was actually removing all the samples. Make sure you’re actually getting data returned from your dataset.