In the spirit of openness (and inspired by sylvain’s google doc) I’m posting my standup/progress update openly instead of on google docs.
Big thank yous to @sgugger for 35 code reviews, @julien-c for moonlanding help support with marian, and @valhalla and @stas for help in so many areas over the past few weeks.
Successes/Done:
- #6340 12 new pegasus checkpoints - SOTA summarization on many new datasets.
-
#6342 357 new MARIAN checkpoints, 48 upgrades = our 1st experiment with model versioning.
conversion code uses very rough heuristics to check whether we have an identical model. (across naming schemes), and either converts, upgrades, or ignores based on BLEU score posted in README.MD. This also added very involved, automated model-carding. - #6526 Partially undo adjust_logits_during_generation hack: metrics improvements across the board. We now are pretty close to metric parity.
- Successfully distilled mbart english to romanian. It works on MT! see
from_pretrained(sshleifer/distilbart-enro-12-6)
or 12-4. - Lots of DM onboarding/QA with stas. He is porting fairseq’s SOTA translator now, after many smaller helpful bug fixes.
- Lots of maintenance while others were on vacation.
- Secret support mission going well.
Struggles:
- I am close (4 common tests failing, integration test passing) on tfbart, could use some help. I think I need to do the huge signature like
TFT5ForConditionalGeneration
. After this, tfpegasus, tfmbart etc. will be easier. - Our Seq2Seq dataloader is slower than fairseq’s because it doesn’t use dynamic batch size.
- external contributors cannot get TPU+seq2seq working, I may take a pass.
- After huge amount of effort, many 24h experiments, our Seq2Seq finetune metrics are still slightly below fairseq, (0.2 ROUGE xsum, 1 BLEU wmt_en_ro). I suspect this is the dataloader.
- Slow GPU CI has been broken for 10 days.
- Haven’t carved out time for research in a while.
Future:
- get marian, pegasus tweeted and “released”. Support as needed.
- tfbart
- CARVE OUT RESEARCH TIME: freezing idea.
- seq2seq dataloader - dynamic batch size, cleanup to always use
prepare_seq2seq_batch
- seq2seq finetune onTPU (through
Trainer
!)