T5 outperforms BART when fine-tuned for summarization task

sarthak09 · July 5, 2022, 9:38am

I was observing a strange behaviour with the fine-tuned model of BART and T5 on the summarization task.
I am referring to the following repository:

Dataset:

It is a collection of dictionaries. Example:-
{"text": "Who is {champion} of {nominee for} {Graduation} ?", "summary": "select ?vr1 where  {  @@entbegin wd: || graduation  @@entend @@relbegin wdt: || nominated for @@relend ?vr0 . ?vr0 @@relbegin wdt: || winner @@relend ?vr1 } }

T5:
I used the exact code in the above posted github link.
Python code for running:
python examples/pytorch/summarization/run_summarization.py
–model_name_or_path t5-small
–do_train
–do_eval
–do_predict
–train_file path_to_csv_or_jsonlines_file
–validation_file path_to_csv_or_jsonlines_file
–source_prefix "summarize: "
–output_dir /tmp/tst-summarization
–overwrite_output_dir
–per_device_train_batch_size=4
–per_device_eval_batch_size=4
–predict_with_generate

T5_result:

{

"text": "Who is {champion} of {nominee for} {Graduation}?",

"summary": "select ?vr1 where { @@entbegin wd: || graduation @@entend @@relbegin wdt: || nominated for @@relend ?vr0 . ?vr0 @@relbegin wdt: || winner @@relend ?vr1 }",

"output": "select ?vr1 where { @@entbegin wd: || graduation @@entend @@relbegin wdt: || nominated for @@relend ?vr0 . ?vr0 @@relbegin wdt: || winner @@relend ?vr1 }",

"score": 1.0

}

BART:
In the above python code I used the model facebook/bart-base of the huggingface models and so the code to run looks like this:
python examples/pytorch/summarization/run_summarization.py
–model_name_or_path facebook/bart-base
–do_train
–do_eval
–do_predict
–train_file path_to_csv_or_jsonlines_file
–validation_file path_to_csv_or_jsonlines_file
–output_dir /tmp/tst-summarization
–overwrite_output_dir
–per_device_train_batch_size=4
–per_device_eval_batch_size=4
–predict_with_generate

(I also removed the source_prefix argument from the python running code compared to what was used in T5’s)

BART_result on the same example:

{'text': 'Who is {champion} of {nominee for} {Graduation} ?',
 'summary': 'select ?vr1 where  {  @@entbegin wd: || graduation  @@entend @@relbegin wdt: || nominated for @@relend ?vr0 . ?vr0 @@relbegin wdt: || winner @@relend ?vr1 } ', 
 'output': 'voy go devoidptper n number now when<unk> market said a Services Ambassador<unk> imagine SmithuptperEverybodyptper," will number now while<unk> women byptper, will leave Elev<unk> the Elev<unk> we said there to a will Aviation Mori<unk> the soccer an nice an<unk> will reg an Mori<unk> which<unk>,<unk> from market from<unk> which Golden<unk> from: from<unk> thatVict whichVict identified Co which<unk> the Soccer an adjacent an Mori which GoldenVict identified Co<unk> whichVict the soccer AN Mori<unk> we declared<unk> to an<unk> which identified W which', 
'score': 0.0}

As evident bart results are gibberish. I don’t understand what’s the reason behind this. Is there some other way to fine-tune BART model compared to T5?
Any lead would be helpful.
TIA!

chrisdoyle · July 20, 2022, 10:58am

First observation - gradients are likely being updated somewhere because a default BART mightn’t spurt out a load of gibberish. I’d start by double checking that all the special characters are recongised as tokens by the tokeniser - you might need to add them

sarthak09 · July 20, 2022, 11:39am

Hi I later on used the fariseq repository for summarization on bart model which gives better results than the huggingface but still T5 is more accurate. I wonder what can be the reason for this.
Check the latest bart result:
“text”: “Who is {champion} of {nominee for} {Graduation}?”,

“summary”: “select ?vr1 where { @@entbegin wd: || graduation @@entend @@relbegin wdt: || nominated for @@relend ?vr0 . ?vr0 @@relbegin wdt: || winner @@relend ?vr1 }”,

“BART-output”: ?vr1 where { @@entbegin wd: || josephe @@entend @@relbegin wdt: || nominated for @@relend ?vr0 . ?vr0 @@relbegin wdt: || winner @@relend ?vr1 }

Any clue why T5 is having a better output than bart?

chrisdoyle · August 8, 2022, 4:15pm

Different tokenisers maybe - I’d start by double checking that all the special characters are recongised as tokens by the tokeniser - you might need to add them to the BART tokenizer

Topic		Replies	Views
T5 model for summarization far from SOTA results Models	0	1344	July 2, 2021
Reproduce the result of Bart in summarization task Models	0	215	March 17, 2023
Use Pretrained T5 for Summarization Beginners	3	636	July 2, 2021
Summarization taks, looking for clarifications before getting started Beginners	10	973	February 16, 2021
Run_summarization.py t5 model output inconsistent results Models	0	235	September 22, 2023

T5 outperforms BART when fine-tuned for summarization task

Related topics