I’ve updated my previous comment. It might make more sense to use the BOS token instead of the padding token as decoder start token.
This was also done for this demo.
I’ve updated my previous comment. It might make more sense to use the BOS token instead of the padding token as decoder start token.
This was also done for this demo.