Num_beams: Faster Summarization without Distillation

sshleifer · November 12, 2020, 12:37am

For many seq2seq models in the hub, num_beams can be set meaningfully lower without hurting metrics.
For xsum, cnn, I tried a bunch of different values and decided these would be the better. (and don’t list if the default is good). The defaults are 8 for all pegasus, 6 for bart*xsum and 4 for bart*cnn. It’s not clear whether to change defaults from the published parameters, (it would be nice to save compute for pipelines and inference API, though) so I figured I’d just post this if people want faster inference. The speedups are substantial: between 20% and 100%. Tends to be easier on cnn_dailymail than xsum.

google/pegasus-cnn_dailymail: 4
sshleifer/distill-pegasus-cnn-16-4: 4
sshleifer/pegasus-cnn-ft-v2: 4
sshleifer/distilbart-cnn-12-3: 3
sshleifer/distilbart-cnn-12-6: 2
sshleifer/distilbart-cnn-6-6: 2
sshleifer/distill-pegasus-xsum-16-4: 4
sshleifer/distill-pegasus-xsum-12-12: 4
facebook/bart-large-cnn

Here are some rouge2 vs num_beams plots for different models

XSUM

CNN

Another note:

facebook/bart-large-xsum: prefix=" " hurts rouge2 by .02. Should be removed. No impact on facebook/bart-large-cnn

stas · November 12, 2020, 1:06am

Awesome sharing, @sshleifer!

Perhaps let’s add these notes to README.md? Otherwise it’d be difficult to remember that this is on the forums - or perhaps create OPTIMIZATIONS.md with various such performance notes - so README focuses on functionality, and the latter for tips and tricks.

Topic		Replies	Views
Bart-base rouge scores Research	11	1729	October 27, 2020
Cannot reproduce the results Beginners	5	882	October 5, 2020
Pegasus Inference for production usecase Beginners	6	1564	February 26, 2021
Quantize and Optimize summarization model (Seq2SeqLM) Beginners	0	350	August 12, 2022
Facebook/bart-large-cnn has a low rouge score on cnn_dailymail Beginners	14	3226	October 5, 2020

Num_beams: Faster Summarization without Distillation

XSUM

CNN

Related topics