[new model] FSMT has been released + 9 models ported

FSMT (=FairSeq Machine Translation) architecture has been ported from fairseq wmt19.

9 trained models have been ported and made available to everybody:

You will find all the necessary details, including sample code and bleu scores and the scoring code on the model pages.

At the moment only the translation part has been thoroughly tested and eval’ed with sacrebleu, other parts (training/finetuning/etc.) may or may not work, so if you encounter any problems please file an issue and tag @stas00 to it.

It appears to be giving the best score for en <=> ru and en <=> de for the current transformers models we have (scores acquired using sacrebleu against the wmt19 dataset). Metrics on v100 using SortishSampler + --fp16 --bs 64:

All the credits for the high BLEU scores go to the facebook team who did the original architecture design and massive pretraining.

Our scores are a tad below fairseq’s scores, since transformers at the moment doesn’t support model ensemble. So we use the best performing checkpoint.

I’m also working on an article of how the porting process went, so it might help others to do similar work. I will link to it once it’s done.

I want to thank @sshleifer for your incredible support, time and mentorship in this difficult process. Your caring help and encouragement were invaluable to me, Sam! I especially appreciated how every so often you were gently suggesting that I get back to work on this difficult project and not get distracted with multiple minor easy improvements I wanted to make for the transformers repo. Thank you.

Conversion Instructions

This should work for fairseq models with moses+bpe tokenizers:

export save_dir='cool_new_fr-en'
python src/transformers/convert_fsmt_original_pytorch_checkpoint_to_pytorch.py \
    --fsmt_checkpoint_path path_to_fairseq_model.pt \
    --pytorch_dump_folder_path $save_dir
cp README.md $save_dir/  # model card
transformers-cli upload $save_dir

For examples please see these scripts that provide conversion for full sets of models:

5 Likes

Great work stas :slight_smile: You finished a hard project with great test coverage!

1 Like

@stas and @sshleifer you rock :fire: . Great work!

1 Like

Don’t tell anybody, but most of the tests were already written - I just had to tweak the existing ones and I only wrote a few new tests. So the gratitude for the test coverage is to all those who wrote the extensive test suite before me.

In fact, most of the code has already been there, I only had to tweak it here and there, so if someone is planning to do a porting and it looks intimidating, the foundation has been already laid out for you and there are enough ported architectures, so different pieces can be stolen from different existing working models.

1 Like