Sending a Dataset or DatasetDict to a GPU

joe999 · April 26, 2022, 11:26am

Hi, relatively new user of Huggingface here, trying to do multi-label classfication, and basing my code off this example.

I have put my own data into a DatasetDict format as follows:

df2 = df[['text_column', 'answer1', 'answer2']].head(1000)
df2['text_column'] = df2['text_column'].astype(str)
dataset = Dataset.from_pandas(df2)

# train/test/validation split
train_testvalid = dataset.train_test_split(test_size=0.1)
test_valid = train_testvalid["test"].train_test_split(test_size=0.5)

# put into DatasetDict
datasets = DatasetDict({
    "train": train_testvalid["train"],
    "test": test_valid["test"],
    "valid": test_valid["train"]})

Later, I load the model using

model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", 
                                                           problem_type="multi_label_classification", 
                                                           num_labels=len(labels),
                                                           id2label=id2label,
                                                           label2id=label2id).to(device)

and then check that the model is on GPU using next(model.parameters()).is_cuda ; if I comment out .to(device), the model is not sent to GPU.

My problem comes when it’s time to train the model as follows:

trainer = Trainer(
    model,
    args,
    train_dataset=encoded_dataset["train"].to(device),
    eval_dataset=encoded_dataset["valid"].to(device),
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)
trainer.train()

which causes the error:

AttributeError: 'Dataset' object has no attribute 'to'

But if I don’t try and send the train and eval datasets to GPU, I get the error:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

So my question is: how can I send these datasets to the GPU - where the model already is - in order to efficiently train and validate the model using them?

Thank you!

BramVanroy · April 26, 2022, 12:22pm

By default, the Trainer will use the GPU if it is available. It will automatically put the model on te GPU as well as each batch as soon as that’s necessary. So just remove all .to() calls that you made manually.

github.com

huggingface/transformers/blob/8afaaa26f5754948f4ddf8f31d70d0293488a897/src/transformers/training_args.py#L1088

      
        
            
            
    device = torch.device("cuda", self.local_rank)
                self._n_gpu = 1
            elif self.local_rank == -1:
                # if n_gpu is > 1 we'll use nn.DataParallel.
                # If you only want to use a specific subset of GPUs use `CUDA_VISIBLE_DEVICES=0`
                # Explicitly set CUDA to the first (index 0) CUDA device, otherwise `set_device` will
                # trigger an error that a device index is missing. Index 0 takes into account the
                # GPUs available in the environment, so `CUDA_VISIBLE_DEVICES=1,2` with `cuda:0`
                # will use the first GPU in that env, i.e. GPU#1
                device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
                # Sometimes the line in the postinit has not been run before we end up here, so just checking we're not at
                # the default value.
                self._n_gpu = torch.cuda.device_count()
            else:
                # Here, we'll use torch.distributed.
                # Initializes the distributed backend which will take care of synchronizing nodes/GPUs
                if not torch.distributed.is_initialized():
                    torch.distributed.init_process_group(backend="nccl")
                device = torch.device("cuda", self.local_rank)
                self._n_gpu = 1

mariosasko · April 26, 2022, 12:34pm

Hi! As @BramVanroy pointed out, our Trainer class uses GPUs by default (if they are available from PyTorch), so you don’t need to manually send the model to GPU. And to fix the issue with the datasets, set their format to torch with .with_format("torch") to return PyTorch tensors when indexed.

joe999 · April 26, 2022, 1:18pm

Thanks you both for responding so quickly.

I can confirm that GPU is available using torch.cuda.is_available(), and I have also done .set_format("torch") on the Datasets. I have removed any explicit .to() calls.

However, if I remove the explicit .to() call on the model, then the model is no longer on the GPU according to next(model.parameters()).is_cuda → returns False.

More importantly, I also still get the Runtime Error:
Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

I was trying to post a minimal example above, but I now
suspect that the problem is that some of the encoding is done on the GPU perhaps? So here is a larger section of code - apologies in advance if this is excessive, but not sure which part is causing the error.

from transformers import AutoTokenizer
import numpy as np

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

def preprocess_data(examples):
  # take a batch of texts
  text = examples["answer_no_tags"]
  # encode them
  encoding = tokenizer(text, padding="max_length", truncation=True, max_length=128)
  # add labels
  labels_batch = {k: examples[k] for k in examples.keys() if k in labels}
  # create numpy array of shape (batch_size, num_labels)
  labels_matrix = np.zeros((len(text), len(labels)))
  # fill numpy array
  for idx, label in enumerate(labels):
    labels_matrix[:, idx] = labels_batch[label]

  encoding["labels"] = labels_matrix.tolist()
  
  return encoding

encoded_dataset = datasets.map(preprocess_data, batched=True, 
                              remove_columns=dataset.column_names)

Loading model

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", 
                                                           problem_type="multi_label_classification", 
                                                           num_labels=len(labels),
                                                           id2label=id2label,
                                                           label2id=label2id)

from transformers import TrainingArguments, Trainer

args = TrainingArguments(
    f"bert-finetuned-sem_eval-english",
    evaluation_strategy = "epoch",
    save_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=5,
    weight_decay=0.01,
    load_best_model_at_end=True,
    metric_for_best_model=metric_name,
    #push_to_hub=True,
)

from sklearn.metrics import f1_score, roc_auc_score, accuracy_score
from transformers import EvalPrediction
import torch
    
# source: https://jesusleal.io/2021/04/21/Longformer-multilabel-classification/
def multi_label_metrics(predictions, labels, threshold=0.5):
    # first, apply sigmoid on predictions which are of shape (batch_size, num_labels)
    sigmoid = torch.nn.Sigmoid()
    probs = sigmoid(torch.Tensor(predictions))
    # next, use threshold to turn them into integer predictions
    y_pred = np.zeros(probs.shape)
    y_pred[np.where(probs >= threshold)] = 1
    # finally, compute metrics
    y_true = labels
    f1_micro_average = f1_score(y_true=y_true, y_pred=y_pred, average='micro')
    roc_auc = roc_auc_score(y_true, y_pred, average = 'micro')
    accuracy = accuracy_score(y_true, y_pred)
    # return as dictionary
    metrics = {'f1': f1_micro_average,
               'roc_auc': roc_auc,
               'accuracy': accuracy}
    return metrics

def compute_metrics(p: EvalPrediction):
    preds = p.predictions[0] if isinstance(p.predictions, 
            tuple) else p.predictions
    result = multi_label_metrics(
        predictions=preds, 
        labels=p.label_ids)
    return result

Finally, training:

trainer = Trainer(
    model,
    args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset["valid"],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)
trainer.train()

Which returns:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

mariosasko · April 28, 2022, 1:05pm

Is this the entire code? I can’t find the part where you change the dataset format to torch.

BramVanroy · April 28, 2022, 1:30pm

The model will be moved to the GPU after you initialize the trainer - not before that.

github.com

huggingface/transformers/blob/dced262409177586bb510b6b724c762fb89da0e8/src/transformers/trainer.py#L381-L382

      
        
            if self.place_model_on_device:
                self._move_model_to_device(model, args.device)

You can verify that the trainer will make use of the GPU by checking trainer.args.device. If that is a GPU, then everything the trainer does will correctly use the GPU.

What I suspect instead is that there is a discrepancy between devices in your custom multi_label_metrics function, which the trainer of course does not control. Check whether predictions and labels are on the same device.

mariosasko · April 28, 2022, 1:52pm

Oh, I think this is a Transformers bug (see When running the Trainer cell, it found two devices (cuda:0 and CPU) · Issue #31 · nlp-with-transformers/notebooks · GitHub). Updating Transformers to the newest version with pip install -U transformers should fix the issue.

joe999 · May 3, 2022, 1:32pm

Thanks for your help @BramVanroy and @mariosasko - much appreciated. I updated Transformers and that has fixed the error, so have marked as solution. Cheers.

coolzhao · July 7, 2022, 2:44am

Also, remember to update the ‘accelerate’ package for making it compatible

miguelwon · August 4, 2022, 9:30pm

Hi @BramVanroy,

I’m experiment the use of mac M1 with transformers 4.21.1 and pytorch 1.13.0.dev20220803.

Is the “Trainer will use the GPU if it is available” also true in the case of M1 (“mps”)? I’m also having issues with this matter. I can’t train with the M1 GPU, only CPU.

Thanks

BramVanroy · August 9, 2022, 3:11pm

You can have a look at this issue for more: TrainingArguments does not support `mps` device (Mac M1 GPU) · Issue #17971 · huggingface/transformers · GitHub

miguelwon · August 9, 2022, 3:27pm

Thanks. Already saw it. But even so there are some problems. First I was getting bad results when using mps (this issue is solved). A second issue is related to performance. I’m not seeing a considerable increase in speed in comparison to the CPU. The issue is also addressed here.

ndvb · April 13, 2023, 9:35am

Before running the Trainer, I want to manually compute the loss:

labels = train_data[‘labels’][0]
input_ids = train_data[‘input_ids’][0]
attention_mask = train_data[‘attention_mask’][0]
outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
loss = outputs.loss

What’s the way to manually convert the dataset to GPU tensor?

mariosasko · April 13, 2023, 3:03pm

Hi! You can do train_data.set_format("torch", device="cuda") to send the dataset’s samples to GPU when indexing into the dataset.

KhaiKit · October 5, 2023, 3:32am

Hi
I understand that if GPU is available the trainer will automatically send the model to GPU, but will the trainer automatically send my tokenized inputs and labels to GPU without having to explicitly include the code .to(device)?
My guess is yes but I would still like to double check.

I have been trying to fine-tune the nllb translation model and I was wondering if I need to include the code .to(device) in my compute metrics function?

metric = evaluate.load("sacrebleu")

def compute_metrics(eval_preds):
    preds, labels = eval_preds
    if isinstance(preds, tuple):
        preds = preds[0]
    decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)

    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

    decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels)

    result = metric.compute(predictions=decoded_preds, references=decoded_labels)
    result = {"bleu": result["score"]}

    prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
    result["gen_len"] = np.mean(prediction_lens)
    result = {k: round(v, 4) for k, v in result.items()}
    return result

PrasannaPrab · June 1, 2024, 8:02pm

This worked. We have to do this to do perform finetuning in a gpu. without setting data to data.with_format(“torch”) it will be processed in cpu.

MrDote · July 18, 2024, 11:37am

Hello,

I am trying to fine-tune a model on my M1 Mac using mps, but get TypeError: can't convert mps:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. inside the transformers/data/data_collator.py file.

So I instantiate the tokenizer, move the model to mps, load the dataset, move the dataset to mps with set_format (without it I get Placeholder storage has not been allocated on MPS device!, which is due to data not being moved to mps), then instantiate and run the Trainer:

tokenizer = AutoTokenizer.from_pretrained(self.config.model_ckpt)
model_pegasus = AutoModelForSeq2SeqLM.from_pretrained(self.config.model_ckpt).to(torch.device('mps'))
seq2seq_DC = DataCollatorForSeq2Seq(tokenizer, model = model_pegasus)
ds = load_from_disk(path_to_ds)
ds.set_format('torch', device="mps")
trainer = Trainer(
                    model = model_pegasus,
                    args = trainer_args,
                    tokenizer = tokenizer,
                    data_collator=seq2seq_DC,
                    train_dataset=ds['train'],
                    eval_dataset=ds['validation']
                )
trainer.train()

Any ideas why this happens, and why the Trainer doesn’t automatically move the inputs to the correct device?

Topic		Replies	Views
Huggingface transformer sequence classification 🤗Transformers	3	501	March 26, 2022
Dataset expected by Trainer Beginners	5	9075	September 28, 2020
Training using multiple GPUs Beginners	20	20193	February 25, 2024
Type of dataset in Trainer class Beginners	3	2475	July 20, 2020
Can I use CUDA with Trainer.train? Beginners	3	8015	May 10, 2022

Sending a Dataset or DatasetDict to a GPU

Related topics