Gather Input tensor at index 1 has invalid shape

karenjw223 · April 7, 2023, 5:02pm

Hi, I am trying to train a VisionTextDualEncoder with some images and captions, using 2 GPUs (T4). However, I know that there are a few corrupted images in my data, so I implemented a customized dataset and collate_fn to skip these corrupted images during training. This works when I am using only 1 GPU, but gets into an error:

  File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/comm.py", line 235, in gather
    return torch._C._gather(tensors, dim, destination)
RuntimeError: Input tensor at index 1 has invalid shape [63, 63], but expected [63, 64]

when I use 2 GPUs, so it must has something to do with the distributed parallel computing. Is there any way to make it allow having different batch sizes? I have tried to search this up and looked into the codes, but I don’t seem to be able to do that.

Thanks

prabhatkr · April 18, 2024, 4:09am

How did you solve this?
For anyone else looking for an answer: You need to add drop_last=True flag in either the training param for datasets loading param.

Refer here

Topic		Replies	Views
XLNET trainer.predict() RuntimeError: Input tensor at index 1 has invalid shape DISTRIBUTED METRICS 🤗Transformers	1	658	September 5, 2023
Using `torch.distributed.all_gather_object` returns error when using 1 GPU but works fine for multiple GPUs 🤗Accelerate	3	2894	July 5, 2023
RuntimeError: Input, output and indices must be on the current device Beginners	0	469	March 8, 2023
Dimension error when trying to use Neuron compiled HF model on inferentia Amazon SageMaker	4	1235	May 20, 2022
Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3! (when checking argument for argument index in method wrapper_CUDA__index_select) Models	0	412	December 25, 2023

Gather Input tensor at index 1 has invalid shape

Related topics