FSMT (=FairSeq Machine Translation) architecture has been ported from fairseq wmt19.
9 trained models have been ported and made available to everybody:
- 4 wmt19 facebook models (Sergei Edunov, et al)
- 3 wmt16 and 2 wmt19 allenai models (Jungo Kasai, et al)
You will find all the necessary details, including sample code and bleu scores and the scoring code on the model pages.
At the moment only the translation part has been thoroughly tested and eval’ed with
sacrebleu, other parts (training/finetuning/etc.) may or may not work, so if you encounter any problems please file an issue and tag @stas00 to it.
It appears to be giving the best score for en <=> ru and en <=> de for the current
transformers models we have (scores acquired using
sacrebleu against the
wmt19 dataset). Metrics on v100 using
--fp16 --bs 64:
All the credits for the high BLEU scores go to the facebook team who did the original architecture design and massive pretraining.
Our scores are a tad below fairseq’s scores, since
transformers at the moment doesn’t support model ensemble. So we use the best performing checkpoint.
I’m also working on an article of how the porting process went, so it might help others to do similar work. I will link to it once it’s done.
I want to thank @sshleifer for your incredible support, time and mentorship in this difficult process. Your caring help and encouragement were invaluable to me, Sam! I especially appreciated how every so often you were gently suggesting that I get back to work on this difficult project and not get distracted with multiple minor easy improvements I wanted to make for the
transformers repo. Thank you.
This should work for fairseq models with moses+bpe tokenizers:
export save_dir='cool_new_fr-en' python src/transformers/convert_fsmt_original_pytorch_checkpoint_to_pytorch.py \ --fsmt_checkpoint_path path_to_fairseq_model.pt \ --pytorch_dump_folder_path $save_dir cp README.md $save_dir/ # model card transformers-cli upload $save_dir
For examples please see these scripts that provide conversion for full sets of models: