Pytorch NLP model doesn’t use GPU when making inference

redrussianarmy · September 16, 2020, 7:36am

There is NLP model trained on Pytorch to be run in Jetson Xavier. I installed Jetson stats to monitor usage of CPU and GPU. When I run the Python script, only CPU cores work on-load, GPU bar does not increase. I have searched on Google about that with keywords of " How to check if pytorch is using the GPU?" and checked results on stackoverflow.com etc. According to their advices to someone else facing similar issue, cuda is available and there is cuda device in my Jetson Xavier. However, I don’t understand why GPU bar does not change, CPU core bars go to the ends.

I don’t want to use CPU, it takes so long to compute. In my opinion, it uses CPU, not GPU. How can I be sure and if it uses CPU, how can I change it to GPU?

Note: Model is taken from huggingface transformers library. I have tried to use cuda() method on the model. (model.cuda()) In this scenario, GPU is used but I can not get an output from model and raises exception.

valhalla · September 16, 2020, 8:54am

model.cuda() or model.to("cuda") puts the model on GPU, you also need to put the inputs on GPU when model is on GPU, may be that’s why you are getting the exception.
Could post your code snippet and the exception, so we can take a look ?

redrussianarmy · September 16, 2020, 11:48am

Here is the exception and code.

Expected object of device type cuda but got device type cpu for argument #3 ‘index’ in call to _th_index_select

from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline
import torch

BERT_DIR = "savasy/bert-base-turkish-squad"    

tokenizer = AutoTokenizer.from_pretrained(BERT_DIR)
model = AutoModelForQuestionAnswering.from_pretrained(BERT_DIR)
nlp=pipeline("question-answering", model=model, tokenizer=tokenizer)


def infer(question,corpus):
    try:
        ans = nlp(question=question, context=corpus)
        return ans["answer"], ans["score"]
    except:
        ans = None
        pass

    return None, 0

valhalla · September 16, 2020, 12:06pm

if you are using pipeline then you won’t need to put the model on GPU manually, pipline can handle that using the device parameter, just pass the gpu device number and it should work. Also you can just pass the BERT_DIR to model parameter, pipeline can load model itself. Try this and let me know.

nlp = pipeline("question-answering", model=BERT_DIR, device=0)

redrussianarmy · September 18, 2020, 1:27pm

Thank you valhalla. It works on GPU now.

jenpeper · January 5, 2024, 8:49pm

Hi, how do we determine the GPU device number? I am deploying my model to a sagemaker endpoint if that matters.

UPDATE:
For anyone else wondering, 0 is the default for the GPU (see definition of device from pipeline documentation below:

device (int, optional, defaults to -1) – Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, >=0 will run the model on the associated CUDA device id.

Topic		Replies	Views
Using GPU with transformers Beginners	4	11905	November 3, 2020
NLP Pretrained model model doesn’t use GPU when making inference 🤗Transformers	11	10159	March 11, 2022
Need help performance issues transformers.AutoModelForCausalLM.from_pretrained( 'mosaicml/mpt-7b-instruct' Beginners	0	936	June 12, 2023
Gpt-neo 27 and 13 Models	2	843	June 18, 2021
Pipeline not using GPU Beginners	0	1571	February 26, 2024

Pytorch NLP model doesn’t use GPU when making inference

Related topics