SFTTrainer training very slow. Is this training speed expected?

Hello,

for anyone interested in the answer. This is the expected training speed for the provided hardware and model size. What I ended up doing to improve the speed substantially is:

  1. Lower the context length from 2048 to 512.
  2. Use mixed precision training.
  3. Using a quantized optimizer.

Step 1 had the biggest impact on training speed. I trained with the lowered context window for ~95% of the data I had and then increased it back to 2048 for the remaining 5%.

1 Like