Truncated last sentence on summaries

I am running the summarization finetuning on the latest master branch version of examples/seq2seq. I am using a custom dataset. However, the last sentence on some of the resulting summaries are truncated. The issue appears to worsen as I increase my dataset size, resulting in a greater proportion of truncated summaries. My parameters are as follows:

'--data_dir=.../data',
'--train_batch_size=1',
'--eval_batch_size=1',
'--output_dir=.../output',
'--num_train_epochs=5',
'--max_target_length=1024'
'--max_source_length=56'
'--model_name_or_path=facebook/bart-large'

Here is a very small data set that I was able to reproduce the issue with (500 training instances).

Is this expected? Any insights would be helpful. Thank you!

I’m experiencing this same issue with BART transformer and I created a Stackoverflow post about the issue: https://stackoverflow.com/questions/66996270/limiting-bart-huggingface-model-to-complete-sentences-of-maximum-length

Here are some of the output summaries with truncated sentences:

EX1: The opacity at the left lung base appears stable from prior exam. There is elevation of the left hemidi
EX 2: There is normal mineralization and alignment. No fracture or osseous lesion is identified. The ankle mort

Were you able to find a solution to this problem you encountered @hf324?

1 Like

I’m facing the same problem and still didn’t find out the solution.