Not using GPU although it is specified

Hello, I am new to the huggingface library and I am currently going over the course.

I want to finetune a BERT model on a dataset (just like it is demonstrated in the course), but when I run it, it gives me +20 hours of runtime.

I therefore tried to run the code with my GPU by importing torch, but the time does not go down.

However, in the course, it says it should only take a couple of minutes with a GPU.

Can someone explain what I am doing wrong? I have a NVIDEA RTX 2060, 16GB of DDR4 RAM and an AMD RYZEN 7.

from datasets import load_dataset
from transformers import AutoTokenizer, DataCollatorWithPadding

raw_datasets = load_dataset("glue", "mrpc")
checkpoint = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

def tokenize_function(example):
    return tokenizer(example["sentence1"], example["sentence2"], truncation=True)


tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

from transformers import TrainingArguments

training_args = TrainingArguments("test-trainer")

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)

import torch

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model = model.to(device)
device                                                        #This outputs 'cuda'

from transformers import Trainer

trainer = Trainer(
    model,
    training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer
)

trainer.train()

The code comes almost directly from this link: Fine-tuning a pretrained model - Hugging Face Course

When running, I have a really small peak in my GPU in my task manager at the beginning, but then the load of the GPU returns to 0.

Thank you in advance!

Hey @blommeolivier, can you try running the commands in this Stack Overflow answer?

One line that looks a bit odd in your code is:

model = model.to(device)

This is probably not the source of the problem (since nn.Module.to() returns self), but is not the conventional way of placing the model on the device. The Trainer does the device placement automatically for you, so you could either remove that line or try:

model.to(device)

Hello, I am having a similar issue where my model is not training on GPU even though it is specified. I am trying to further pre-train a BERT model on domain specific documents using the automodelforMLM with a pytorch framework. I have GPUs available ( cuda.is_available() returns true) and did model.to(device). It seems the model briefly goes on GPU, then trains on CPU fully. Anyone have any advice on how to fix this? Iā€™ve tried moving the training data to pytorch tensors which gives errors. Thanks.

Heres a snippet of my code:

from transformers import AutoTokenizer, AutoModelForMaskedLM, DataCollatorForLanguageModeling
from transformers import TrainingArguments, Trainer
from torch import cuda
from datasets import Dataset, concatenate_datasets

device = 'cuda' if cuda.is_available() else 'cpu'
cuda.empty_cache()
print(device)

gives output ā€˜cudaā€™

model_checkpoint = "bert-base-uncased"

tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, truncation=True, padding='max_length', return_special_tokens_mask=True)

text_df = pd.DataFrame({'Text':text})

# set up train and eval dataset
train_size=0.8
train_dataset = text_df.sample(frac=train_size,random_state=200)
test_dataset = text_df.drop(train_dataset.index).reset_index(drop=True)
train_dataset = train_dataset.reset_index(drop=True)

print("defined training and test set")
def tokenize(text_df, tokenizer):
    tokenized_inputs = tokenizer(text_df["Text"], is_split_into_words=False, padding='max_length', 
                                 truncation=True, 
                                 return_special_tokens_mask=True)# , return_tensors="pt").to(device) #commented out bc gives errors
    return tokenized_inputs

train_data = Dataset.from_pandas(train_dataset).map(tokenize,
    fn_kwargs={'tokenizer':tokenizer},
    remove_columns=['Text'])
#train_data.set_format("torch")
test_data = Dataset.from_pandas(test_dataset).map(tokenize,
    fn_kwargs={'tokenizer':tokenizer},
    remove_columns=['Text'])
#test_data.set_format("torch")
print("tokenized data")

test_labels = Dataset.from_pandas(pd.DataFrame({'labels':test_data['input_ids'].copy()}))
train_labels = Dataset.from_pandas(pd.DataFrame({'labels':train_data['input_ids'].copy()}))
test_data = concatenate_datasets([test_data, test_labels], axis=1)
train_data = concatenate_datasets([train_data, train_labels], axis=1)


#initiating model
model = AutoModelForMaskedLM.from_pretrained(model_checkpoint)
model.to(device)
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer)
args = TrainingArguments(
    save_path),
    evaluation_strategy="steps",
    save_strategy="epoch",
    learning_rate=1e-3,
    num_train_epochs=1,
    weight_decay=0.01,
    push_to_hub=False,
    per_device_train_batch_size = 8,#256,
    per_device_eval_batch_size = 8,#256,
    logging_steps=50,
    eval_steps = 50,
    save_total_limit = 3, #saves only last 3 checkpoints
    gradient_accumulation_steps=32,#64,
    gradient_checkpointing=True,
    fp16=True,
    optim="adafactor"
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_data,
    eval_dataset=test_data,
    data_collator=data_collator,
    tokenizer=tokenizer,
)

train_result = trainer.train()

EDIT: I think there could be an issue with compatibility of pytorch and cuda. It looks like I have cuda 11.7 and which but my pytorch installation is for 11.6? this shouldnā€™t make a difference though from what Iā€™ve read

Solution:
If you are having this problem like I was, you need to check your cuda versions.

See Trainer .

Run which nvcc or nvcc -V to get your cuda version. If this does not work, you either dont have the system level cuda or it needs to be added to your PATH. This nvcc version must match the cuda version pytorch was installed with. If you need to install a new system level cuda, you must unistall/reinstall both pytorch and huggingface transformers afterwards.

2 Likes

Try updating transformers

pip install -U transformers

Hey @lewtun, Iā€™m hoping you or anyone can help. Iā€™m having this same problem but the difference is I have an AMD device and trying to use directml or opencl so I create a device and call mode.to(torch.device(ā€œocl:0ā€) and see the logs point to the device moving to my gpu then I see gpu spike in utilization then it gets moved back to cpu and trains on cpu after that. How can I setup trainer so that it fully trains on the device I tell it to??

1 Like