RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking arugment for argument index in method wrapper_index_select)

I am working in a Google Coalab session with a HuggingFace DistilBERT model which I have fine tuned against some data.

I am getting the following error when I try to evaluate a restored copy of my model:-

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking arugment for argument index in method wrapper_index_select)

I run the following piece of code TWICE. Once just after fitting the model, and then once after saving and restoring the model.

metric= load_metric("accuracy")
model.eval()
for batch in test_dataloader:
    batch = {k: v.to(device) for k, v in batch.items()}
    with torch.no_grad():
        outputs = model(**batch)

    logits = outputs.logits
    predictions = torch.argmax(logits, dim=-1)
    metric.add_batch(predictions=predictions, references=batch["labels"])

metric.compute()

If I run the evaluation straight after training there is no problem:-

/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:10: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  # Remove the CWD from sys.path while we load stuff.
{'accuracy': 0.6692307692307692}

If I run the above code after saving and restoring the model then I get the error quoted above, the full traceback for which is:-

/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:10: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  # Remove the CWD from sys.path while we load stuff.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-102-33bf1579632a> in <module>()
      4     batch = {k: v.to(device) for k, v in batch.items()}
      5     with torch.no_grad():
----> 6         outputs = model(**batch)
      7 
      8     logits = outputs.logits

8 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   2041         # remove once script supports set_grad_enabled
   2042         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2043     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
   2044 
   2045 

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking arugment for argument index in method wrapper_index_select)

The steps I take for saving and restoring are as follows:-

  1. Write the model to the colab session’s local disc:-
  2. Write from local disc (of the colab session) to Google Drive
  3. Write back from Google Drive to the colab session’s local disc
  4. Use the copy on the local drive to load the model

The code for step 1 has been adapted from that at run_glue.py and is as follows:-

# Saving best-practices: if you use defaults names for the model, you can reload it using from_pretrained()

output_dir = './a_local_copy/'

# Create output directory if needed

if not os.path.exists(output_dir):

    os.makedirs(output_dir)

#logger.info("Saving model checkpoint to %s", args.output_dir)

print("Saving model checkpoint to %s" % output_dir)

# Save a trained model, configuration and tokenizer using `save_pretrained()`.

# They can then be reloaded using `from_pretrained()`

model_to_save = model.module if hasattr(model, 'module') else model  # Take care of distributed/parallel training

model_to_save.save_pretrained(output_dir)

tokenizer.save_pretrained(output_dir)

# Good practice: save your training arguments together with the trained model

# torch.save(args, os.path.join(output_dir, 'training_args.bin'))

Step 4 is the straightforward:-

model = AutoModelForSequenceClassification.from_pretrained(output_dir)
tokenizer = AutoTokenizer.from_pretrained(output_dir)

I am happy to load further code if you could give me some guidance as to what would be useful.

1 Like

I think after you load the model, it is no longer on GPU, try
model = AutoModelForSequenceClassification.from_pretrained(output_dir).to(device)

7 Likes

Perfect - that fixed it - thank you Eyup

1 Like

Hello I am new here because I get the same message after installing Stable Diffusion 1.5. I have two GPU’s, one from Intel and my NVIDIA card. Apparently the installation does not recognize the correct card. Where can I paste your code above then, I’m not a PC professional nor do I have any programming skills. Thanks for help, been working on this for 2 days…

Hi @ehalit , I am also facing the similar issue when training on 2 or more GPU’s. Do I need to change my DemoDataset(Dataset): class? RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

Kindly help, please!

Hi~I have met the similar problem, when I try to realize the image-to-text generation demo of blip-2(blip-2 demo). As the network problem of loading, I use the offline model.

Here is my code:

import torch
from PIL import Image
from transformers import Blip2Processor, Blip2ForConditionalGeneration

# setup device to use
device = torch.device("cuda") if torch.cuda.is_available() else "cpu"
# load sample image
raw_image = Image.open("./demo.jpg")

# loads BLIP-2 pre-trained model
vis_processors = Blip2Processor.from_pretrained("xxx/models/blip2-flan-t5-xxl")
model = Blip2ForConditionalGeneration.from_pretrained("xxx/models/blip2-flan-t5-xxl", device_map="auto")

raw_question = "Question: Which city is this?"
inputs = vis_processors(raw_image, raw_question, return_tensors="pt").to("cuda")
out = model.generate(**inputs)
print(vis_processors.decode(out[0], skip_special_tokens=True))

The error information is

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0!

I have tried the solutions from issues , but it still have the same error.

The device condition I used is a 8-gpus-cluster of 3090.

Thanks a lot in advance.