Can I use CUDA with Trainer.train?

Gian1994 · May 5, 2022, 3:19pm

Hello,
I’m having a problem in using CUDA with Trainer.train(). This kind of problem is not present when training models using the whole PyTorch pipeline, but I would love to understand where I am getting it wrong to use also this powerful class.
I know for sure this is very silly, but I’m a beginner and can’t understand what I’m doing wrong!

Transformer version: 4.11.3
I got this error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)
With the following code, obtained playing a little bit with the examples seen in the course:
from transformers import AutoModelForSequenceClassification, AutoTokenizer, DataCollatorWithPadding, TrainingArguments, Trainer from datasets import load_dataset import torch.cuda import torch


def tokenize_function(example):

return tokenizer(example[“sentence”], padding=‘max_length’, truncation=True, max_length=256)
device = “cuda:0” if torch.cuda.is_available() else “cpu”

checkpoint = ‘roberta-base’

tokenizer = AutoTokenizer.from_pretrained(checkpoint)

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
raw_datasets = load_dataset(“glue”, “cola”)

raw_datasets_tokenized = raw_datasets.map(tokenize_function)
tokenized_datasets = raw_datasets_tokenized.remove_columns([“sentence”, “idx”])

tokenized_datasets = tokenized_datasets.rename_column(“label”, “labels”)

tokenized_datasets.set_format(“torch”)

tokenized_datasets[“train”].column_names
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2).to(device)

training_args = TrainingArguments(“test-trainer”)
trainer = Trainer(

model,

training_args,

train_dataset=tokenized_datasets[“train”],

eval_dataset=tokenized_datasets[“validation”],

data_collator=data_collator,

tokenizer=tokenizer,

)

trainer.train()

I understood this problem was because my data is not on CUDA. So, I tried:
def tokenize_function(example): return tokenizer(example[“sentence”], padding=‘max_length’, truncation=True, max_length=256, return_tensors=‘pt’).to(device)

but, when using trainer.train(), I got this error ValueError: too many values to unpack (expected 2):


ValueError                                Traceback (most recent call last)
Input In [21], in ()
----> 1 trainer.train()
File ~/anaconda3/envs/huggingface/lib/python3.8/site-packages/transformers/trainer.py:1316, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)

1314         tr_loss_step = self.training_step(model, inputs)

1315 else:

 → 1316     tr_loss_step = self.training_step(model, inputs)

1318 if (

1319     args.logging_nan_inf_filter

1320     and not is_torch_tpu_available()

1321     and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))

1322 ):

1323     # if loss is nan or inf simply add the average of previous logged losses

1324     tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)
File ~/anaconda3/envs/huggingface/lib/python3.8/site-packages/transformers/trainer.py:1849, in Trainer.training_step(self, model, inputs)

1847         loss = self.compute_loss(model, inputs)

1848 else:

 → 1849     loss = self.compute_loss(model, inputs)

1851 if self.args.n_gpu > 1:

1852     loss = loss.mean()  # mean() to average on multi-gpu parallel training
File ~/anaconda3/envs/huggingface/lib/python3.8/site-packages/transformers/trainer.py:1881, in Trainer.compute_loss(self, model, inputs, return_outputs)

1879 else:

1880     labels = None

 → 1881 outputs = model(**inputs)

1882 # Save past state if it exists

1883 # TODO: this needs to be fixed and made cleaner later.

1884 if self.args.past_index >= 0:
File ~/anaconda3/envs/huggingface/lib/python3.8/site-packages/torch/nn/modules/module.py:1110, in Module._call_impl(self, *input, **kwargs)

1106 # If we don’t have any hooks, we want to skip the rest of the logic in

1107 # this function, and just call forward.

1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks

1109         or _global_forward_hooks or _global_forward_pre_hooks):

 → 1110     return forward_call(*input, **kwargs)

1111 # Do not call functions when jit is used

1112 full_backward_hooks, non_full_backward_hooks = [], []
File ~/anaconda3/envs/huggingface/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py:1198, in RobertaForSequenceClassification.forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)

1190 r"""

1191 labels (:obj:torch.LongTensor of shape :obj:(batch_size,), optional):

1192     Labels for computing the sequence classification/regression loss. Indices should be in :obj:[0, ...,    1193     config.num_labels - 1]. If :obj:config.num_labels == 1 a regression loss is computed (Mean-Square loss),

1194     If :obj:config.num_labels > 1 a classification loss is computed (Cross-Entropy).

1195 “”"

1196 return_dict = return_dict if return_dict is not None else self.config.use_return_dict

 → 1198 outputs = self.roberta(

1199     input_ids,

1200     attention_mask=attention_mask,

1201     token_type_ids=token_type_ids,

1202     position_ids=position_ids,

1203     head_mask=head_mask,

1204     inputs_embeds=inputs_embeds,

1205     output_attentions=output_attentions,

1206     output_hidden_states=output_hidden_states,

1207     return_dict=return_dict,

1208 )

1209 sequence_output = outputs[0]

1210 logits = self.classifier(sequence_output)
File ~/anaconda3/envs/huggingface/lib/python3.8/site-packages/torch/nn/modules/module.py:1110, in Module._call_impl(self, *input, **kwargs)

1106 # If we don’t have any hooks, we want to skip the rest of the logic in

1107 # this function, and just call forward.

1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks

1109         or _global_forward_hooks or _global_forward_pre_hooks):

 → 1110     return forward_call(*input, **kwargs)

1111 # Do not call functions when jit is used

1112 full_backward_hooks, non_full_backward_hooks = [], []
File ~/anaconda3/envs/huggingface/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py:802, in RobertaModel.forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)

799 else:

800     raise ValueError(“You have to specify either input_ids or inputs_embeds”)

 → 802 batch_size, seq_length = input_shape

803 device = input_ids.device if input_ids is not None else inputs_embeds.device

805 # past_key_values_length

ValueError: too many values to unpack (expected 2)

Thanks in advance for reading me.
Gianluca

mariosasko · May 6, 2022, 12:04pm

Hi! The first error is a known bug (see Sending a Dataset or DatasetDict to a GPU). Can you update your installation of transformers with pip install -U transformers and rerun the first snippet?

prashanth · May 7, 2022, 9:45pm

I too faced a similar issue. Updating the transformer worked for me.

Gian1994 · May 10, 2022, 9:57am

Thanks! It worked!

Topic		Replies	Views
How to get the Trainer API to use GPU? Beginners	0	1556	May 21, 2021
Huggingface transformer sequence classification 🤗Transformers	3	490	March 26, 2022
Trainer.train throws RuntimeError: Expected all tensors to be on the same device Beginners	5	3297	May 17, 2023
Using 3 GPUs for training with Trainer() of transformers 🤗Transformers	2	2278	October 18, 2023
Is Transformers using GPU by default? Beginners	6	153534	December 11, 2023

Can I use CUDA with Trainer.train?

Related topics