I was observing a strange behaviour with the fine-tuned model of BART and T5 on the summarization task.
I am referring to the following repository:
Dataset:
It is a collection of dictionaries. Example:-
{"text": "Who is {champion} of {nominee for} {Graduation} ?", "summary": "select ?vr1 where { @@entbegin wd: || graduation @@entend @@relbegin wdt: || nominated for @@relend ?vr0 . ?vr0 @@relbegin wdt: || winner @@relend ?vr1 } }
T5:
I used the exact code in the above posted github link.
Python code for running:
python examples/pytorch/summarization/run_summarization.py
–model_name_or_path t5-small
–do_train
–do_eval
–do_predict
–train_file path_to_csv_or_jsonlines_file
–validation_file path_to_csv_or_jsonlines_file
–source_prefix "summarize: "
–output_dir /tmp/tst-summarization
–overwrite_output_dir
–per_device_train_batch_size=4
–per_device_eval_batch_size=4
–predict_with_generate
T5_result:
{
"text": "Who is {champion} of {nominee for} {Graduation}?",
"summary": "select ?vr1 where { @@entbegin wd: || graduation @@entend @@relbegin wdt: || nominated for @@relend ?vr0 . ?vr0 @@relbegin wdt: || winner @@relend ?vr1 }",
"output": "select ?vr1 where { @@entbegin wd: || graduation @@entend @@relbegin wdt: || nominated for @@relend ?vr0 . ?vr0 @@relbegin wdt: || winner @@relend ?vr1 }",
"score": 1.0
}
BART:
In the above python code I used the model facebook/bart-base of the huggingface models and so the code to run looks like this:
python examples/pytorch/summarization/run_summarization.py
–model_name_or_path facebook/bart-base
–do_train
–do_eval
–do_predict
–train_file path_to_csv_or_jsonlines_file
–validation_file path_to_csv_or_jsonlines_file
–output_dir /tmp/tst-summarization
–overwrite_output_dir
–per_device_train_batch_size=4
–per_device_eval_batch_size=4
–predict_with_generate
(I also removed the source_prefix argument from the python running code compared to what was used in T5’s)
BART_result on the same example:
{'text': 'Who is {champion} of {nominee for} {Graduation} ?',
'summary': 'select ?vr1 where { @@entbegin wd: || graduation @@entend @@relbegin wdt: || nominated for @@relend ?vr0 . ?vr0 @@relbegin wdt: || winner @@relend ?vr1 } ',
'output': 'voy go devoidptper n number now when<unk> market said a Services Ambassador<unk> imagine SmithuptperEverybodyptper," will number now while<unk> women byptper, will leave Elev<unk> the Elev<unk> we said there to a will Aviation Mori<unk> the soccer an nice an<unk> will reg an Mori<unk> which<unk>,<unk> from market from<unk> which Golden<unk> from: from<unk> thatVict whichVict identified Co which<unk> the Soccer an adjacent an Mori which GoldenVict identified Co<unk> whichVict the soccer AN Mori<unk> we declared<unk> to an<unk> which identified W which',
'score': 0.0}
As evident bart results are gibberish. I don’t understand what’s the reason behind this. Is there some other way to fine-tune BART model compared to T5?
Any lead would be helpful.
TIA!