The task is to generate keyword from sentences.
The keyword may not appear in the sentences.
So input masked sentences to predict whole sentences, it is not benefit the keywords generation task.
Input masked sentences to predict whole sentences, it do not have relation to the keywords generation task.
Am I right? Is it the reason that pretraining do not improve the BLEU score?
With all due respect, you are asking a question on a forum dedicated to a specific library transformers by HuggingFace, but the question does not involve that library. In fact, you are using a completely different library. I am not sure if this is the right place for such questions. @sgugger
On the research part of the forum, we welcome any general questions, though of course we would prefer you to use our models @sshleifer might have some answer as he is the Bart person on the team.
1, I pad some zeros in the input tokens for multi sentences. The output positions of output tokens should be exactly same to the input tokens, which means I should keep the padding zeros in the output tokens.