Trainer epoch does not go through all training data?

Skylixia · January 22, 2021, 2:21pm

Hello

I’m training a model with transformers Trainer but when I set the number of epoch to eg: 1000 then it seems the training just does 1000 steps however an epoch is normally the number of times the model goes through the entire dataset. Thus, how can we use the trainer such that each epoch goes through the full training dataset (and that we see the progression of these)

Thanks!

sgugger · January 22, 2021, 2:40pm

Hi there!

Please post the command/code you are executing as we can’t really help without that.

Skylixia · January 22, 2021, 2:56pm

Sure, sorry! I thought it was not a code specific question but rather about the parameters of the Trainer class

Here is the code I use:

from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
output_dir=’./out’,
num_train_epochs=1000,
per_device_train_batch_size=2,
per_device_eval_batch_size=2,
warmup_steps=500,
weight_decay=0.01,
logging_dir=’./logs’,
logging_steps=10,
)

trainer = Trainer(
model=model,
args=training_args,
train_dataset=training_data,
eval_dataset=validation_data)

trainer.train()

I would expect that there are 1000 epoch that goes through the full train_dataset but the output goes very fast and prints “epoch 1/1000, epoch 2/1000, epoch 3/1000, …” which give the impression that the epoch is simply a training step rather than an actual epoch. However, having 20k training instances I expect to see 20k steps for each epoch and this normally take some time.

sgugger · January 22, 2021, 5:57pm

I wanted to check your num_train_epochs wasn’t overridden by other parameters like max_steps. The code looks correct and the logs do indicate you are going through the epochs. Double check the length of your dataset to make sure it’s not reduced to something small.

Skylixia · January 22, 2021, 7:24pm

My dataset seem to have the right size. Using Wandb logger it seems that there is only 1 step performed in each epoch indeed… I don’t know what could be wrong.
My code is basically done as in this post.

Topic		Replies	Views
Is it possible to set epoch less than 1 when using Trainer 🤗Transformers	1	1274	June 18, 2022
Trainer does not show epochs or steps just 1 line without numbers Course	0	414	October 5, 2023
How to make Trainer train the model one epoch at a time? 🤗Transformers	1	1822	March 29, 2022
There seems to be not a single sample in your epoch_iterator, stopping training at step 0! This is expected if you're using an IterableDataset and set num_steps (5000000) higher than the number of available samples Beginners	2	1674	April 19, 2023
Explicitly set number of training steps using Trainer 🤗Transformers	5	9328	September 16, 2020

Trainer epoch does not go through all training data?

Related topics