Pytorch NLP model doesn’t use GPU when making inference

model.cuda() or model.to("cuda") puts the model on GPU, you also need to put the inputs on GPU when model is on GPU, may be that’s why you are getting the exception.
Could post your code snippet and the exception, so we can take a look ?