Very slow training (>5mins per batch) - code review request

I’d like some help with QARAC, my research project on creating language models that encode logic and consistency.

I’ve recently ported the code from Tensorflow to PyTorch, since I need to train three models together against a combination of four objectives, and PyTorch appears to be more suitable for this than TensorFlow. I thought it would be sensible to test the training script on my own laptop before spending lots of computing resources and money on training it. When I did so, I found that single batch of data took over 5 minutes to process. This suggests to me that even with GPUs or TPUs, training this model would be intractable as it stands, and also that there are likely to be significant inefficiencies in my code.

I’d really appreciate it if somebody could go over the code with me and try to help me spot any problems with it.

You need to actually move your data and model to the GPU. Aka model.cuda() and all of your inputs as well (but do x = x.cuda() since it’s not an inplace operation like it is with models). Right now you’re just training on your CPU, hence why it is so slow

I know that I need to do that, but I’m worried that it’s so slow that it will still be very slow on GPUs, and I’d like to check that there isn’t some underlying inefficiency before doing it.