Use this category for any question or feedback related to the Natural Language Processing with Transformers book.
Hi Sylvain - I purchased your Transformers Book and been reading it the past few days. It’s a great deep dive on the platform. I was testing out the colab notebook for Chapter 6 on summarization and I’ve hit a few errors I can’t seem to fix for some reason. When running the fine-tuning code (i.e., trainer.train()) with the notebook code exactly as written, I end up with error A below:
ERROR A // RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)
The error is fixed when I edit the line “dataset_samsum_pt.set_format(type=“torch”, columns=columns)” to “dataset_samsum_pt.set_format(type=“torch”, columns=columns, device=device)” where device is equal to ‘cuda’, but then I end up with the error B below. Can you provide some guidance?
Thanks a lot!
ERROR B: ===================================
The following columns in the training set don’t have a corresponding argument in
PegasusForConditionalGeneration.forward and have been ignored: id, summary, dialogue.
***** Running training *****
Num examples = 14732
Num Epochs = 1
Instantaneous batch size per device = 1
Total train batch size (w. parallel, distributed & accumulation) = 16
Gradient Accumulation steps = 16
Total optimization steps = 920
TypeError Traceback (most recent call last)
1 # hide_output
----> 2 trainer.train()
3 score = evaluate_summaries_pegasus(
4 dataset_samsum[“test”], rouge_metric, trainer.model, tokenizer,
5 batch_size=2, column_text=“dialogue”, column_summary=“summary”)
<array_function internals> in concatenate(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/torch/_tensor.py in array(self, dtype)
755 return handle_torch_function(Tensor.array, (self,), self, dtype=dtype)
756 if dtype is None:
→ 757 return self.numpy()
759 return self.numpy().astype(dtype, copy=False)
TypeError: can’t convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.