I am training custom model (very similar to Q and A). E.g my input is long sentences and the output is very short sentences, So my target
can contain several padded sequence… E.g below you can see my target its batch of 3 and 2 targets are short so I had to pad them with padding token which is (0).
[1, 23, 34, 54]
[56, 45, 0, 0]
[32, 1, 2, 0]
so my question is what is the correct way to calculate loss ? Should I just calculate n F.cross_entropy as it is … or generate a mask which should ignore losss in the place where ‘0’ is present? (e.g F.cross_entropy(logits, labels, ignore_index = 0)
Thank you.