@sshleifer Progress Update Aug 4 -> Aug 19

In the spirit of openness (and inspired by sylvain’s google doc) I’m posting my standup/progress update openly instead of on google docs.

Big thank yous to @sgugger for 35 code reviews, @julien-c for moonlanding help support with marian, and @valhalla and @stas for help in so many areas over the past few weeks.


  • #6340 12 new pegasus checkpoints - SOTA summarization on many new datasets.
  • #6342 357 new MARIAN checkpoints, 48 upgrades = our 1st experiment with model versioning.
    conversion code uses very rough heuristics to check whether we have an identical model. (across naming schemes), and either converts, upgrades, or ignores based on BLEU score posted in README.MD. This also added very involved, automated model-carding.
  • #6526 Partially undo adjust_logits_during_generation hack: metrics improvements across the board. We now are pretty close to metric parity.
  • Successfully distilled mbart english to romanian. It works on MT! see from_pretrained(sshleifer/distilbart-enro-12-6) or 12-4.
  • Lots of DM onboarding/QA with stas. He is porting fairseq’s SOTA translator now, after many smaller helpful bug fixes.
  • Lots of maintenance while others were on vacation.
  • Secret support mission going well.


  • I am close (4 common tests failing, integration test passing) on tfbart, could use some help. I think I need to do the huge signature like TFT5ForConditionalGeneration. After this, tfpegasus, tfmbart etc. will be easier.
  • Our Seq2Seq dataloader is slower than fairseq’s because it doesn’t use dynamic batch size.
  • external contributors cannot get TPU+seq2seq working, I may take a pass.
  • After huge amount of effort, many 24h experiments, our Seq2Seq finetune metrics are still slightly below fairseq, (0.2 ROUGE xsum, 1 BLEU wmt_en_ro). I suspect this is the dataloader.
  • Slow GPU CI has been broken for 10 days.
  • Haven’t carved out time for research in a while.


  1. get marian, pegasus tweeted and “released”. Support as needed.
  2. tfbart
  3. CARVE OUT RESEARCH TIME: freezing idea.
  4. seq2seq dataloader - dynamic batch size, cleanup to always use prepare_seq2seq_batch
  5. seq2seq finetune onTPU (through Trainer !)

Already sounds like great progress! I’m very interested in the seq2seq stuff these days, and I miss customizability in onmt a bit - I did some efforts, but everything is very interconnected (data and models)) so customization is not very straightforward - so having great seq2seq in transformers would be helpful for me.

About SOTA on summarization; are you planning to jot down your findings in a paper or blogpost? Might be an interesting read!

Not really my findings, just the pegasus paper and a bug fix. We used to be forcing Bart and all it’s children (Mbart, Pegasus) to generate bos after decoder_start_token_id, because fairseq does that.
With @valhalla 's help, we figured out that this only helps for bart-large-cnn, so I added a config attribute force_bos_token_to_be_generated, which defaults to False, but is set to true for bart-large-cnn and distilbart-cnn*.

Metrics Impact

facebook/bart-large-xsum : + 0.6 ROUGE2, 22.38

facebook/mbart-large-enro: +0.3 BLEU, 28.15

google/pegasus-xsum: +.3 ROUGE2, 24.43

facebook/bart-large-cnn: 0

1 Like

get marian, pegasus tweeted and “released”. Support as needed.

is latest release coming soon ? Let me when to tweet .

1 Like

@sshleifer Thanks to you for answering all of my questions timely !


Me too - I’ve been getting an amazing and caring support from @sshleifer! So appreciating you, Sam!