Gather Input tensor at index 1 has invalid shape

Hi, I am trying to train a VisionTextDualEncoder with some images and captions, using 2 GPUs (T4). However, I know that there are a few corrupted images in my data, so I implemented a customized dataset and collate_fn to skip these corrupted images during training. This works when I am using only 1 GPU, but gets into an error:

  File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/", line 235, in gather
    return torch._C._gather(tensors, dim, destination)
RuntimeError: Input tensor at index 1 has invalid shape [63, 63], but expected [63, 64]

when I use 2 GPUs, so it must has something to do with the distributed parallel computing. Is there any way to make it allow having different batch sizes? I have tried to search this up and looked into the codes, but I don’t seem to be able to do that.