ViT Trainer throws out of memory error when I'm using save and eval strategy as epoch but works with steps strategy

Snehith749 · November 28, 2022, 3:12pm

The trainer args which is throwing out of memory error
training_args = TrainingArguments(
output_dir=“./vit-cifar10”,
per_device_train_batch_size=1,
evaluation_strategy=“epoch”,
save_strategy = “epoch”,
num_train_epochs=4,
fp16=True,
logging_steps=10000,
learning_rate=2e-4,
save_total_limit=2,
remove_unused_columns=False,
push_to_hub=False,
report_to=‘tensorboard’,
load_best_model_at_end=True,
)

The wroking trainer args
training_args = TrainingArguments(
output_dir=“./vit-cifar10”,
per_device_train_batch_size=32,
evaluation_strategy=“steps”,
num_train_epochs=1,
fp16=True,
save_steps=1000,
eval_steps=1000,
logging_steps=10,
learning_rate=2e-4,
save_total_limit=2,
remove_unused_columns=False,
push_to_hub=False,
report_to=‘tensorboard’,
load_best_model_at_end=True,
)
I’m not sure why it is going out of memory for the epoch strategy. I’m using a TeslaP40 with 24GB memory.

Topic		Replies	Views
How to process trainer.evaluate in batch mode to deal with Out of Memory error 🤗Datasets	0	330	March 22, 2023
CUDA out of memory when using the trainer model_init 🤗Transformers	0	248	December 31, 2023
Finetuning BioGPT: Encountering Out of Memory error during evaluation Models	1	41	April 15, 2025
Training out of memory 🤗Transformers	0	217	July 18, 2024
Trainer will evaluate using my entire validation set(60k), which gives me cuda memory usage issue:. Is there a param that allows evaluating only on some batches in validation set? Beginners	4	1314	April 29, 2022

ViT Trainer throws out of memory error when I'm using save and eval strategy as epoch but works with steps strategy

Related topics