SFTTrainer training very slow. Is this training speed expected?

domce20 · February 8, 2025, 6:49pm

Hello,

for anyone interested in the answer. This is the expected training speed for the provided hardware and model size. What I ended up doing to improve the speed substantially is:

Lower the context length from 2048 to 512.
Use mixed precision training.
Using a quantized optimizer.

Step 1 had the biggest impact on training speed. I trained with the lowered context window for ~95% of the data I had and then increased it back to 2048 for the remaining 5%.

Topic		Replies	Views
SFTTrainer training very slow on GPU. Is this training speed expected? 🤗Transformers	4	304	February 8, 2025
SFTTrainer too slow during the build (or ingestion) phase 🤗Transformers	0	94	November 27, 2024
SFTTrainer takes up so much ram that it breaks an A100 GPU 🤗Transformers	0	206	July 6, 2024
Very slow training (>5mins per batch) - code review request Research	2	646	October 11, 2023
Reproduce SFTTrainer with Accelerate and Pytorch 🤗Accelerate	0	43	May 18, 2025

SFTTrainer training very slow. Is this training speed expected?

Related topics