I am using accelarteor to train a model on multiple GTX 1080 GPU. It takes ~3 sec to process 128 samples (16 per each GPU). Even using A100 GPU., I am getting same speed. When I use Trainer module, I am getting faster processing only in one GPU. What can be the source of these differences ?
trainer uses accelerate in the backend if available, they probably just optimized its use.
It’s also possible that the Trainer makes use of other libraries on top of accelerate. For example, libraries like flash attention can also have an impact on model processing speeds.