I implemented my own collate_fn to perform padding on both my input and target sequences. My programs was hang during evaluation while doing gather(). I checked and found that the batches distributed to each GPU were not consistently padded (i.e., padded to the longest length within the batch, but not consistent across GPUs).
I think this should be a common situation in training distributed seq2seq models, but I can’t find a solution to this. Is there a way to ensure consistent padding?
Thanks!