[new model] FSMT has been released + 9 models ported

stas · September 23, 2020, 7:58pm

FSMT (=FairSeq Machine Translation) architecture has been ported from fairseq wmt19.

9 trained models have been ported and made available to everybody:

4 wmt19 facebook models (Sergei Edunov, et al)
3 wmt16 and 2 wmt19 allenai models (Jungo Kasai, et al)

You will find all the necessary details, including sample code and bleu scores and the scoring code on the model pages.

At the moment only the translation part has been thoroughly tested and eval’ed with sacrebleu, other parts (training/finetuning/etc.) may or may not work, so if you encounter any problems please file an issue and tag @stas00 to it.

It appears to be giving the best score for en <=> ru and en <=> de for the current transformers models we have (scores acquired using sacrebleu against the wmt19 dataset). Metrics on v100 using SortishSampler + --fp16 --bs 64:

All the credits for the high BLEU scores go to the facebook team who did the original architecture design and massive pretraining.

Our scores are a tad below fairseq’s scores, since transformers at the moment doesn’t support model ensemble. So we use the best performing checkpoint.

I’m also working on an article of how the porting process went, so it might help others to do similar work. I will link to it once it’s done.

I want to thank @sshleifer for your incredible support, time and mentorship in this difficult process. Your caring help and encouragement were invaluable to me, Sam! I especially appreciated how every so often you were gently suggesting that I get back to work on this difficult project and not get distracted with multiple minor easy improvements I wanted to make for the transformers repo. Thank you.

Conversion Instructions

This should work for fairseq models with moses+bpe tokenizers:

export save_dir='cool_new_fr-en'
python src/transformers/convert_fsmt_original_pytorch_checkpoint_to_pytorch.py \
    --fsmt_checkpoint_path path_to_fairseq_model.pt \
    --pytorch_dump_folder_path $save_dir
cp README.md $save_dir/  # model card
transformers-cli upload $save_dir

For examples please see these scripts that provide conversion for full sets of models:

sshleifer · September 23, 2020, 10:01pm

Great work stas You finished a hard project with great test coverage!

valhalla · September 24, 2020, 2:08pm

@stas and @sshleifer you rock . Great work!

stas · September 25, 2020, 1:45am

Don’t tell anybody, but most of the tests were already written - I just had to tweak the existing ones and I only wrote a few new tests. So the gratitude for the test coverage is to all those who wrote the extensive test suite before me.

In fact, most of the code has already been there, I only had to tweak it here and there, so if someone is planning to do a porting and it looks intimidating, the foundation has been already laid out for you and there are enough ported architectures, so different pieces can be stolen from different existing working models.

Topic		Replies	Views
How can I convert a model created with fairseq? Beginners	32	13106	January 21, 2023
How to convert Fairseq model to huggingface transformer model Beginners	1	742	October 31, 2023
@sshleifer Progress Update Aug 4 -> Aug 19 🤗Transformers	5	502	August 19, 2020
Issues with translating inputs containing repeated phrases 🤗Transformers	1	1530	September 9, 2020
A few wmt seq2seq dataset-related scripts Beginners	0	586	August 12, 2020

[new model] FSMT has been released + 9 models ported

Conversion Instructions

Related topics