Pytorch NLP model doesn’t use GPU when making inference

There is NLP model trained on Pytorch to be run in Jetson Xavier. I installed Jetson stats to monitor usage of CPU and GPU. When I run the Python script, only CPU cores work on-load, GPU bar does not increase. I have searched on Google about that with keywords of " How to check if pytorch is using the GPU?" and checked results on stackoverflow.com etc. According to their advices to someone else facing similar issue, cuda is available and there is cuda device in my Jetson Xavier. However, I don’t understand why GPU bar does not change, CPU core bars go to the ends.

I don’t want to use CPU, it takes so long to compute. In my opinion, it uses CPU, not GPU. How can I be sure and if it uses CPU, how can I change it to GPU?

Note: Model is taken from huggingface transformers library. I have tried to use cuda() method on the model. (model.cuda()) In this scenario, GPU is used but I can not get an output from model and raises exception.

model.cuda() or model.to("cuda") puts the model on GPU, you also need to put the inputs on GPU when model is on GPU, may be that’s why you are getting the exception.
Could post your code snippet and the exception, so we can take a look ?

Here is the exception and code.

Expected object of device type cuda but got device type cpu for argument #3 ‘index’ in call to _th_index_select

from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline
import torch

BERT_DIR = "savasy/bert-base-turkish-squad"    

tokenizer = AutoTokenizer.from_pretrained(BERT_DIR)
model = AutoModelForQuestionAnswering.from_pretrained(BERT_DIR)
nlp=pipeline("question-answering", model=model, tokenizer=tokenizer)


def infer(question,corpus):
    try:
        ans = nlp(question=question, context=corpus)
        return ans["answer"], ans["score"]
    except:
        ans = None
        pass

    return None, 0

if you are using pipeline then you won’t need to put the model on GPU manually, pipline can handle that using the device parameter, just pass the gpu device number and it should work. Also you can just pass the BERT_DIR to model parameter, pipeline can load model itself. Try this and let me know.

nlp = pipeline("question-answering", model=BERT_DIR, device=0)
2 Likes

Thank you valhalla. It works on GPU now.

Hi, how do we determine the GPU device number? I am deploying my model to a sagemaker endpoint if that matters.

UPDATE:
For anyone else wondering, 0 is the default for the GPU (see definition of device from pipeline documentation below:

  • device (int, optional, defaults to -1) – Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, >=0 will run the model on the associated CUDA device id.