I am running the summarization finetuning on the latest master branch version of examples/seq2seq. I am using a custom dataset. However, the last sentence on some of the resulting summaries are truncated. The issue appears to worsen as I increase my dataset size, resulting in a greater proportion of truncated summaries. My parameters are as follows:
'--data_dir=.../data',
'--train_batch_size=1',
'--eval_batch_size=1',
'--output_dir=.../output',
'--num_train_epochs=5',
'--max_target_length=1024'
'--max_source_length=56'
'--model_name_or_path=facebook/bart-large'
Here is a very small data set that I was able to reproduce the issue with (500 training instances).
Is this expected? Any insights would be helpful. Thank you!