I trained the mBART by using different decoding methods for summarization. I found that nucleus sampling ends up just copying entire sentences from the input … My training data is not at all extractive and the other decoding methods didnt do this. Is this a common behavior with nucleus sampling ? I understand the decoder sample from a cumulative distribution above p but how can it be that it would reproduce exact sequences from the input ?
Thanks in advance for your input