Using GPU with transformers

Hi! I am pretty new to Hugging Face and I am struggling with next sentence prediction model. I would like it to use a GPU device inside a Colab Notebook but I am not able to do it. This is my proposal:

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForNextSentencePrediction.from_pretrained('bert-base-uncased', return_dict=True)
model.to("cuda:0")
prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
next_sentence = "The sky is blue due to the shorter wavelength of blue light."
tokenizer_output = tokenizer(prompt, next_sentence, return_tensors='pt')
tokens_tensor = tokenizer_output['input_ids'].to('cuda:0')
token_type_ids = tokenizer_output['token_type_ids'].to('cuda:0')
attention_mask = tokenizer_output['attention_mask'].to('cuda:0')
encoding = {'input_ids' : tokens_tensor, 
        'token_type_ids' : token_type_ids, 
        'attention_mask' : attention_mask}
outputs = model(**encoding)
logits = outputs.logits
print(logits) # next sentence was random

However, it does not work. The GPU device is properly set, since I used the following command to check it:

import torch; print(torch.cuda.get_device_name(0))

I would appreciate any help

What do you mean by “doesn’t work”. What errors do you get or how do you know the GPU isn’t used.

No, it does not return any error.
I think that it is not using the GPU since I have tested the time with and without “GPU instructions”, and it is not better, it is even a little bit worse.

Sorry, that is not enough information to go on because “you think” it is not using the GPU. You can monitor your GPU usage with something like nvidia-smi or your task manager. Furthermore, trying this for one single item will probably not yield you a big difference either if you include the model loading in the timing because that is the slowest part for a single item.

We will need more information and a more thorough investigation on your part.

Hi @BramVanroy, thank you for your answers. As you suggested, one item was not enough to test GPU improvement and I was also taking into account the model loading timing.
Finally, I managed to simplify my code (just in case it helps somebody):

prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
next_sentence = "The sky is blue due to the shorter wavelength of blue light."
encoding = tokenizer.encode_plus(prompt, next_sentence, add_special_tokens=True, max_length=tokenizer.max_len, return_tensors='pt').to('cuda:0')
outputs = model(**encoding, next_sentence_label=torch.cuda.LongTensor([1]))
t = outputs.loss
print(t)

Thank you again and sorry for bothering you with this issue, but as I said I am quite new to this transformers “world”