Switch batch size and gradient accumulation step values mid training

tranv · February 28, 2024, 8:42pm

Hello, I’m currently training a model on A40 GPU using batch size of 8 and gradient accumulation step 4. Training the model on A100 is almost twice as fast but it has 40 GB VRAM while A40 has 45 GB VRAM. If I switch the values to batch size 4 and gradient accumulation steps 8, will the results of the training stay not affected (stay the same as if I used batch size 8 and gradient acc. 4)?

Topic		Replies	Views
Batch size, gradient accumulation steps for Linear schedule Models	0	734	May 1, 2021
Selecting batch_size and gradient_accumulation_steps when fine-tuning Models	1	2359	December 31, 2023
Question about Gradient Accumulation step in Trainer 🤗Transformers	2	2680	September 10, 2021
Batch size vs gradient accumulation Beginners	9	37051	November 28, 2024
Using gradient_accumulation_steps does not give the same results 🤗Accelerate	0	524	February 18, 2023

Switch batch size and gradient accumulation step values mid training

Related topics