Pytorch NLP model doesn’t use GPU when making inference

There is NLP model trained on Pytorch to be run in Jetson Xavier. I installed Jetson stats to monitor usage of CPU and GPU. When I run the Python script, only CPU cores work on-load, GPU bar does not increase. I have searched on Google about that with keywords of " How to check if pytorch is using the GPU?" and checked results on etc. According to their advices to someone else facing similar issue, cuda is available and there is cuda device in my Jetson Xavier. However, I don’t understand why GPU bar does not change, CPU core bars go to the ends.

I don’t want to use CPU, it takes so long to compute. In my opinion, it uses CPU, not GPU. How can I be sure and if it uses CPU, how can I change it to GPU?

Note: Model is taken from huggingface transformers library. I have tried to use cuda() method on the model. (model.cuda()) In this scenario, GPU is used but I can not get an output from model and raises exception.

model.cuda() or"cuda") puts the model on GPU, you also need to put the inputs on GPU when model is on GPU, may be that’s why you are getting the exception.
Could post your code snippet and the exception, so we can take a look ?

Here is the exception and code.

Expected object of device type cuda but got device type cpu for argument #3 ‘index’ in call to _th_index_select

from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline
import torch

BERT_DIR = "savasy/bert-base-turkish-squad"    

tokenizer = AutoTokenizer.from_pretrained(BERT_DIR)
model = AutoModelForQuestionAnswering.from_pretrained(BERT_DIR)
nlp=pipeline("question-answering", model=model, tokenizer=tokenizer)

def infer(question,corpus):
        ans = nlp(question=question, context=corpus)
        return ans["answer"], ans["score"]
        ans = None

    return None, 0

if you are using pipeline then you won’t need to put the model on GPU manually, pipline can handle that using the device parameter, just pass the gpu device number and it should work. Also you can just pass the BERT_DIR to model parameter, pipeline can load model itself. Try this and let me know.

nlp = pipeline("question-answering", model=BERT_DIR, device=0)

Thank you valhalla. It works on GPU now.