Sending a Dataset or DatasetDict to a GPU

Hi, relatively new user of Huggingface here, trying to do multi-label classfication, and basing my code off this example.

I have put my own data into a DatasetDict format as follows:

df2 = df[['text_column', 'answer1', 'answer2']].head(1000)
df2['text_column'] = df2['text_column'].astype(str)
dataset = Dataset.from_pandas(df2)

# train/test/validation split
train_testvalid = dataset.train_test_split(test_size=0.1)
test_valid = train_testvalid["test"].train_test_split(test_size=0.5)

# put into DatasetDict
datasets = DatasetDict({
    "train": train_testvalid["train"],
    "test": test_valid["test"],
    "valid": test_valid["train"]})

Later, I load the model using

model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", 
                                                           problem_type="multi_label_classification", 
                                                           num_labels=len(labels),
                                                           id2label=id2label,
                                                           label2id=label2id).to(device)

and then check that the model is on GPU using next(model.parameters()).is_cuda ; if I comment out .to(device), the model is not sent to GPU.

My problem comes when itā€™s time to train the model as follows:

trainer = Trainer(
    model,
    args,
    train_dataset=encoded_dataset["train"].to(device),
    eval_dataset=encoded_dataset["valid"].to(device),
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)
trainer.train()

which causes the error:

AttributeError: 'Dataset' object has no attribute 'to'

But if I donā€™t try and send the train and eval datasets to GPU, I get the error:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

So my question is: how can I send these datasets to the GPU - where the model already is - in order to efficiently train and validate the model using them?

Thank you!

3 Likes

By default, the Trainer will use the GPU if it is available. It will automatically put the model on te GPU as well as each batch as soon as thatā€™s necessary. So just remove all .to() calls that you made manually.

3 Likes

Hi! As @BramVanroy pointed out, our Trainer class uses GPUs by default (if they are available from PyTorch), so you donā€™t need to manually send the model to GPU. And to fix the issue with the datasets, set their format to torch with .with_format("torch") to return PyTorch tensors when indexed.

3 Likes

Thanks you both for responding so quickly.

I can confirm that GPU is available using torch.cuda.is_available(), and I have also done .set_format("torch") on the Datasets. I have removed any explicit .to() calls.

However, if I remove the explicit .to() call on the model, then the model is no longer on the GPU according to next(model.parameters()).is_cuda ā†’ returns False.

More importantly, I also still get the Runtime Error:
Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

I was trying to post a minimal example above, but I now
suspect that the problem is that some of the encoding is done on the GPU perhaps? So here is a larger section of code - apologies in advance if this is excessive, but not sure which part is causing the error.

from transformers import AutoTokenizer
import numpy as np

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

def preprocess_data(examples):
  # take a batch of texts
  text = examples["answer_no_tags"]
  # encode them
  encoding = tokenizer(text, padding="max_length", truncation=True, max_length=128)
  # add labels
  labels_batch = {k: examples[k] for k in examples.keys() if k in labels}
  # create numpy array of shape (batch_size, num_labels)
  labels_matrix = np.zeros((len(text), len(labels)))
  # fill numpy array
  for idx, label in enumerate(labels):
    labels_matrix[:, idx] = labels_batch[label]

  encoding["labels"] = labels_matrix.tolist()
  
  return encoding

encoded_dataset = datasets.map(preprocess_data, batched=True, 
                              remove_columns=dataset.column_names)

Loading model

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", 
                                                           problem_type="multi_label_classification", 
                                                           num_labels=len(labels),
                                                           id2label=id2label,
                                                           label2id=label2id)

from transformers import TrainingArguments, Trainer

args = TrainingArguments(
    f"bert-finetuned-sem_eval-english",
    evaluation_strategy = "epoch",
    save_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=5,
    weight_decay=0.01,
    load_best_model_at_end=True,
    metric_for_best_model=metric_name,
    #push_to_hub=True,
)

from sklearn.metrics import f1_score, roc_auc_score, accuracy_score
from transformers import EvalPrediction
import torch
    
# source: https://jesusleal.io/2021/04/21/Longformer-multilabel-classification/
def multi_label_metrics(predictions, labels, threshold=0.5):
    # first, apply sigmoid on predictions which are of shape (batch_size, num_labels)
    sigmoid = torch.nn.Sigmoid()
    probs = sigmoid(torch.Tensor(predictions))
    # next, use threshold to turn them into integer predictions
    y_pred = np.zeros(probs.shape)
    y_pred[np.where(probs >= threshold)] = 1
    # finally, compute metrics
    y_true = labels
    f1_micro_average = f1_score(y_true=y_true, y_pred=y_pred, average='micro')
    roc_auc = roc_auc_score(y_true, y_pred, average = 'micro')
    accuracy = accuracy_score(y_true, y_pred)
    # return as dictionary
    metrics = {'f1': f1_micro_average,
               'roc_auc': roc_auc,
               'accuracy': accuracy}
    return metrics

def compute_metrics(p: EvalPrediction):
    preds = p.predictions[0] if isinstance(p.predictions, 
            tuple) else p.predictions
    result = multi_label_metrics(
        predictions=preds, 
        labels=p.label_ids)
    return result

Finally, training:

trainer = Trainer(
    model,
    args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset["valid"],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)
trainer.train()

Which returns:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

Is this the entire code? I canā€™t find the part where you change the dataset format to torch.

The model will be moved to the GPU after you initialize the trainer - not before that.

You can verify that the trainer will make use of the GPU by checking trainer.args.device. If that is a GPU, then everything the trainer does will correctly use the GPU.

What I suspect instead is that there is a discrepancy between devices in your custom multi_label_metrics function, which the trainer of course does not control. Check whether predictions and labels are on the same device.

1 Like

Oh, I think this is a Transformers bug (see When running the Trainer cell, it found two devices (cuda:0 and CPU) Ā· Issue #31 Ā· nlp-with-transformers/notebooks Ā· GitHub). Updating Transformers to the newest version with pip install -U transformers should fix the issue.

3 Likes

Thanks for your help @BramVanroy and @mariosasko - much appreciated. I updated Transformers and that has fixed the error, so have marked as solution. Cheers.

Also, remember to update the ā€˜accelerateā€™ package for making it compatible

1 Like

Hi @BramVanroy,

Iā€™m experiment the use of mac M1 with transformers 4.21.1 and pytorch 1.13.0.dev20220803.

Is the ā€œTrainer will use the GPU if it is availableā€ also true in the case of M1 (ā€œmpsā€)? Iā€™m also having issues with this matter. I canā€™t train with the M1 GPU, only CPU.

Thanks

You can have a look at this issue for more: TrainingArguments does not support `mps` device (Mac M1 GPU) Ā· Issue #17971 Ā· huggingface/transformers Ā· GitHub

Thanks. Already saw it. But even so there are some problems. First I was getting bad results when using mps (this issue is solved). A second issue is related to performance. Iā€™m not seeing a considerable increase in speed in comparison to the CPU. The issue is also addressed here.

Before running the Trainer, I want to manually compute the loss:

labels = train_data[ā€˜labelsā€™][0]
input_ids = train_data[ā€˜input_idsā€™][0]
attention_mask = train_data[ā€˜attention_maskā€™][0]
outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
loss = outputs.loss

Whatā€™s the way to manually convert the dataset to GPU tensor?

Hi! You can do train_data.set_format("torch", device="cuda") to send the datasetā€™s samples to GPU when indexing into the dataset.

3 Likes

Hi
I understand that if GPU is available the trainer will automatically send the model to GPU, but will the trainer automatically send my tokenized inputs and labels to GPU without having to explicitly include the code .to(device)?
My guess is yes but I would still like to double check.

I have been trying to fine-tune the nllb translation model and I was wondering if I need to include the code .to(device) in my compute metrics function?

metric = evaluate.load("sacrebleu")

def compute_metrics(eval_preds):
    preds, labels = eval_preds
    if isinstance(preds, tuple):
        preds = preds[0]
    decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)

    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

    decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels)

    result = metric.compute(predictions=decoded_preds, references=decoded_labels)
    result = {"bleu": result["score"]}

    prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
    result["gen_len"] = np.mean(prediction_lens)
    result = {k: round(v, 4) for k, v in result.items()}
    return result

This worked. We have to do this to do perform finetuning in a gpu. without setting data to data.with_format(ā€œtorchā€) it will be processed in cpu.

Hello,

I am trying to fine-tune a model on my M1 Mac using mps, but get TypeError: can't convert mps:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. inside the transformers/data/data_collator.py file.

So I instantiate the tokenizer, move the model to mps, load the dataset, move the dataset to mps with set_format (without it I get Placeholder storage has not been allocated on MPS device!, which is due to data not being moved to mps), then instantiate and run the Trainer:

tokenizer = AutoTokenizer.from_pretrained(self.config.model_ckpt)
model_pegasus = AutoModelForSeq2SeqLM.from_pretrained(self.config.model_ckpt).to(torch.device('mps'))
seq2seq_DC = DataCollatorForSeq2Seq(tokenizer, model = model_pegasus)
ds = load_from_disk(path_to_ds)
ds.set_format('torch', device="mps")
trainer = Trainer(
                    model = model_pegasus,
                    args = trainer_args,
                    tokenizer = tokenizer,
                    data_collator=seq2seq_DC,
                    train_dataset=ds['train'],
                    eval_dataset=ds['validation']
                )
trainer.train()

Any ideas why this happens, and why the Trainer doesnā€™t automatically move the inputs to the correct device?