FSMT (=FairSeq Machine Translation) architecture has been ported from fairseq wmt19.
9 trained models have been ported and made available to everybody:
- 4 wmt19 facebook models (Sergei Edunov, et al)
- 3 wmt16 and 2 wmt19 allenai models (Jungo Kasai, et al)
You will find all the necessary details, including sample code and bleu scores and the scoring code on the model pages.
At the moment only the translation part has been thoroughly tested and eval’ed with sacrebleu
, other parts (training/finetuning/etc.) may or may not work, so if you encounter any problems please file an issue and tag @stas00 to it.
It appears to be giving the best score for en <=> ru and en <=> de for the current transformers
models we have (scores acquired using sacrebleu
against the wmt19
dataset). Metrics on v100 using SortishSampler
+ --fp16 --bs 64
:
All the credits for the high BLEU scores go to the facebook team who did the original architecture design and massive pretraining.
Our scores are a tad below fairseq’s scores, since transformers
at the moment doesn’t support model ensemble. So we use the best performing checkpoint.
I’m also working on an article of how the porting process went, so it might help others to do similar work. I will link to it once it’s done.
I want to thank @sshleifer for your incredible support, time and mentorship in this difficult process. Your caring help and encouragement were invaluable to me, Sam! I especially appreciated how every so often you were gently suggesting that I get back to work on this difficult project and not get distracted with multiple minor easy improvements I wanted to make for the transformers
repo. Thank you.
Conversion Instructions
This should work for fairseq models with moses+bpe tokenizers:
export save_dir='cool_new_fr-en'
python src/transformers/convert_fsmt_original_pytorch_checkpoint_to_pytorch.py \
--fsmt_checkpoint_path path_to_fairseq_model.pt \
--pytorch_dump_folder_path $save_dir
cp README.md $save_dir/ # model card
transformers-cli upload $save_dir
For examples please see these scripts that provide conversion for full sets of models: