Can I use CUDA with Trainer.train?

Hello,
I’m having a problem in using CUDA with Trainer.train(). This kind of problem is not present when training models using the whole PyTorch pipeline, but I would love to understand where I am getting it wrong to use also this powerful class.
I know for sure this is very silly, but I’m a beginner and can’t understand what I’m doing wrong!

Transformer version: 4.11.3
I got this error:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

With the following code, obtained playing a little bit with the examples seen in the course:

from transformers import AutoModelForSequenceClassification, AutoTokenizer, DataCollatorWithPadding,
TrainingArguments, Trainer
from datasets import load_dataset
import torch.cuda
import torch

def tokenize_function(example):
return tokenizer(example[“sentence”], padding=‘max_length’, truncation=True, max_length=256)

device = “cuda:0” if torch.cuda.is_available() else “cpu”
checkpoint = ‘roberta-base’
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

raw_datasets = load_dataset(“glue”, “cola”)
raw_datasets_tokenized = raw_datasets.map(tokenize_function)

tokenized_datasets = raw_datasets_tokenized.remove_columns([“sentence”, “idx”])
tokenized_datasets = tokenized_datasets.rename_column(“label”, “labels”)
tokenized_datasets.set_format(“torch”)
tokenized_datasets[“train”].column_names

model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2).to(device)
training_args = TrainingArguments(“test-trainer”)

trainer = Trainer(
model,
training_args,
train_dataset=tokenized_datasets[“train”],
eval_dataset=tokenized_datasets[“validation”],
data_collator=data_collator,
tokenizer=tokenizer,
)

trainer.train()

I understood this problem was because my data is not on CUDA. So, I tried:

def tokenize_function(example):
return tokenizer(example[“sentence”], padding=‘max_length’, truncation=True, max_length=256, return_tensors=‘pt’).to(device)

but, when using trainer.train(), I got this error ValueError: too many values to unpack (expected 2):

ValueError Traceback (most recent call last) Input In [21], in () ----> 1 trainer.train()

File ~/anaconda3/envs/huggingface/lib/python3.8/site-packages/transformers/trainer.py:1316, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1314 tr_loss_step = self.training_step(model, inputs)
1315 else:
→ 1316 tr_loss_step = self.training_step(model, inputs)
1318 if (
1319 args.logging_nan_inf_filter
1320 and not is_torch_tpu_available()
1321 and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
1322 ):
1323 # if loss is nan or inf simply add the average of previous logged losses
1324 tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)

File ~/anaconda3/envs/huggingface/lib/python3.8/site-packages/transformers/trainer.py:1849, in Trainer.training_step(self, model, inputs)
1847 loss = self.compute_loss(model, inputs)
1848 else:
→ 1849 loss = self.compute_loss(model, inputs)
1851 if self.args.n_gpu > 1:
1852 loss = loss.mean() # mean() to average on multi-gpu parallel training

File ~/anaconda3/envs/huggingface/lib/python3.8/site-packages/transformers/trainer.py:1881, in Trainer.compute_loss(self, model, inputs, return_outputs)
1879 else:
1880 labels = None
→ 1881 outputs = model(**inputs)
1882 # Save past state if it exists
1883 # TODO: this needs to be fixed and made cleaner later.
1884 if self.args.past_index >= 0:

File ~/anaconda3/envs/huggingface/lib/python3.8/site-packages/torch/nn/modules/module.py:1110, in Module._call_impl(self, *input, **kwargs)
1106 # If we don’t have any hooks, we want to skip the rest of the logic in
1107 # this function, and just call forward.
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
→ 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []

File ~/anaconda3/envs/huggingface/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py:1198, in RobertaForSequenceClassification.forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
1190 r"""
1191 labels (:obj:torch.LongTensor of shape :obj:(batch_size,), optional):
1192 Labels for computing the sequence classification/regression loss. Indices should be in :obj:[0, ..., 1193 config.num_labels - 1]. If :obj:config.num_labels == 1 a regression loss is computed (Mean-Square loss),
1194 If :obj:config.num_labels > 1 a classification loss is computed (Cross-Entropy).
1195 “”"
1196 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
→ 1198 outputs = self.roberta(
1199 input_ids,
1200 attention_mask=attention_mask,
1201 token_type_ids=token_type_ids,
1202 position_ids=position_ids,
1203 head_mask=head_mask,
1204 inputs_embeds=inputs_embeds,
1205 output_attentions=output_attentions,
1206 output_hidden_states=output_hidden_states,
1207 return_dict=return_dict,
1208 )
1209 sequence_output = outputs[0]
1210 logits = self.classifier(sequence_output)

File ~/anaconda3/envs/huggingface/lib/python3.8/site-packages/torch/nn/modules/module.py:1110, in Module._call_impl(self, *input, **kwargs)
1106 # If we don’t have any hooks, we want to skip the rest of the logic in
1107 # this function, and just call forward.
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
→ 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []

File ~/anaconda3/envs/huggingface/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py:802, in RobertaModel.forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
799 else:
800 raise ValueError(“You have to specify either input_ids or inputs_embeds”)
→ 802 batch_size, seq_length = input_shape
803 device = input_ids.device if input_ids is not None else inputs_embeds.device
805 # past_key_values_length

ValueError: too many values to unpack (expected 2)

Thanks in advance for reading me.
Gianluca

Hi! The first error is a known bug (see Sending a Dataset or DatasetDict to a GPU). Can you update your installation of transformers with pip install -U transformers and rerun the first snippet?

2 Likes

I too faced a similar issue. Updating the transformer worked for me.

1 Like

Thanks! It worked!