ViT problem with GPU usage require image to be numpy

Hello,
I’m using ViT from your packages, in particular I’m using facebook/deit-base-patch16-384, when I try and test it on CPU all the workflow goes well but when I test it on GPU I receive this error:

Using GPU
Trainable param: 105521444
Traceback (most recent call last):
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\train.py”, line 97, in
pred_points = model_gcn(graph, pool)
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\venv\lib\site-packages\torch\nn\modules\module.py”, line 1110, in _call_impl
return forward_call(*input, **kwargs)
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\model\mesh_network.py”, line 29, in forward
features = pool(elli_points, self.feat_extr, self.transf)
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\utils\pool.py”, line 16, in call
feat_conv3, feat_conv4, feat_conv5 = feat_extr(self.im)
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\venv\lib\site-packages\torch\nn\modules\module.py”, line 1110, in _call_impl
return forward_call(*input, **kwargs)
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\model\image\transformer.py”, line 14, in forward
inputs = self.feature_extractor(x[0], return_tensors=“pt”)
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\venv\lib\site-packages\transformers\models\vit\feature_extraction_vit.py”, line 141, in call
images = [self.resize(image=image, size=self.size, resample=self.resample) for image in images]
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\venv\lib\site-packages\transformers\models\vit\feature_extraction_vit.py”, line 141, in
images = [self.resize(image=image, size=self.size, resample=self.resample) for image in images]
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\venv\lib\site-packages\transformers\image_utils.py”, line 218, in resize
image = self.to_pil_image(image)
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\venv\lib\site-packages\transformers\image_utils.py”, line 104, in to_pil_image
image = image.numpy()
TypeError: can’t convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

I try to modify the image_utils.py file at the line 104 with a changes that add .cpu().clone().numpy() but when I insert the changes the error becomes:

Using GPU
Trainable param: 105521444
Traceback (most recent call last):
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\train.py”, line 97, in
pred_points = model_gcn(graph, pool)
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\venv\lib\site-packages\torch\nn\modules\module.py”, line 1110, in _call_impl
return forward_call(*input, **kwargs)
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\model\mesh_network.py”, line 29, in forward
features = pool(elli_points, self.feat_extr, self.transf)
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\utils\pool.py”, line 16, in call
feat_conv3, feat_conv4, feat_conv5 = feat_extr(self.im)
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\venv\lib\site-packages\torch\nn\modules\module.py”, line 1110, in _call_impl
return forward_call(*input, **kwargs)
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\model\image\transformer.py”, line 16, in forward
outputs = self.model(**inputs, output_hidden_states=True)
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\venv\lib\site-packages\torch\nn\modules\module.py”, line 1110, in _call_impl
return forward_call(*input, **kwargs)
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\venv\lib\site-packages\transformers\models\vit\modeling_vit.py”, line 572, in forward
embedding_output = self.embeddings(
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\venv\lib\site-packages\torch\nn\modules\module.py”, line 1110, in _call_impl
return forward_call(*input, **kwargs)
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\venv\lib\site-packages\transformers\models\vit\modeling_vit.py”, line 135, in forward
embeddings = self.patch_embeddings(pixel_values, interpolate_pos_encoding=interpolate_pos_encoding)
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\venv\lib\site-packages\torch\nn\modules\module.py”, line 1110, in _call_impl
return forward_call(*input, **kwargs)
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\venv\lib\site-packages\transformers\models\vit\modeling_vit.py”, line 191, in forward
x = self.projection(pixel_values).flatten(2).transpose(1, 2)
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\venv\lib\site-packages\torch\nn\modules\module.py”, line 1110, in _call_impl
return forward_call(*input, **kwargs)
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\venv\lib\site-packages\torch\nn\modules\conv.py”, line 447, in forward
return self._conv_forward(input, self.weight, self.bias)
File “D:\WorkingDirectory\UNIVPM\Progetti\ComputerGraphics\3DGen\myversion\pixel2mesh-geometric\p2m\venv\lib\site-packages\torch\nn\modules\conv.py”, line 443, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

So I don’t understand why if the image that I use already a torch tensor do you need to convert it to numpy array()

Can you help me please?

Please someone can help me?

Do you have a code snippet to reproduce your error?

I can share my github code, but privately for the moment and you can reproduce the environment with the data. If you agree please let me know your mail.