BLOOM models don't run on my GPU using Transformers

TornButter · September 18, 2022, 6:12pm

The following code successfully runs on my CPU, maxing out a few cores while 3090’s usage remains at 0%:
import torch
from transformers import BloomTokenizerFast, BloomForCausalLM
tokenizer = BloomTokenizerFast.from_pretrained(“bigscience/bloom-560m”)
model = BloomForCausalLM.from_pretrained(“bigscience/bloom-560m”)
prompt = “Dave picked up the baseball and”
result_length = 100
inputs = tokenizer(prompt, return_tensors=“pt”)
raw = model.generate(inputs[“input_ids”],max_length=result_length)[0]
print(tokenizer.decode(raw))

However, I want to use my GPU. I have tried using different models like 1b7, but the same result. When using accelerate and device_map=“auto”, torch_dtype=“auto”, the models run on my GPU, but when trying to decode, I get an error that says “RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!”. When I add device = torch.device(“cuda:0”) and .to(device) to the end of the model line, I get the same runtime error. Same with .cuda(). I made sure to configure accelerate to not use my CPU. What am I doing wrong?

TornButter · September 18, 2022, 6:29pm

I found the solution. I added these 2 lines before raw, then made raw run on inputs2

device = torch.device(“cuda:0”)
inputs2 = inputs.to(device)

Here is the complete working code:
import torch
from transformers import BloomTokenizerFast, BloomForCausalLM
tokenizer = BloomTokenizerFast.from_pretrained(“bigscience/bloom-560m”)
model = BloomForCausalLM.from_pretrained(“bigscience/bloom-560m”).cuda()
prompt = “Dave picked up the baseball and”
result_length = 100
inputs = tokenizer(prompt, return_tensors=“pt”)
device = torch.device(“cuda:0”)
inputs2 = inputs.to(device)
raw = model.generate(inputs2[“input_ids”],max_length=result_length)[0]
print(tokenizer.decode(raw))

Topic		Replies	Views
Is Transformers using GPU by default? Beginners	6	154358	December 11, 2023
How to use GPU when using transformers.AutoModel DeepSpeed	0	1678	February 3, 2024
CUDA Memory Error While Trying to Run Bloom Locally Beginners	2	988	January 10, 2023
Issues loading NLLB 54B MoE model for multi-GPU inferencing using accelerate 🤗Transformers	0	899	April 22, 2023
No CUDA-capable device is detected Beginners	0	434	August 21, 2022

BLOOM models don't run on my GPU using Transformers

Related topics