That’s actually a mistake in the documentation, it should be “by shifting the labels” instead of “by shifting the input_ids”. Can you open a PR to fix this?
Sure, I will
Seems like the implementation is correct
Yes, now everything makes sense, thank you!