Is native Pytorch training loop much slower than Trainer?

You’re spot on! If requires_ grad isn’t set to False for earlier layers, Py Torch ends up training the whole model instead of just the last layer. Freezing the earlier layers by setting requires_ grad=False` helps focus training where it’s needed.