How can I convert a model created with fairseq?

I fine tuned facebook’s model mbart.cc25 for machine translation with Fairseq, it saved its model as checkpoint_*.pt. How can I use it now with Transformers, is it possible? Thanks

Unless the naming conventions that are used in transformers are the same as in fairseq, this is not possible out of the box. However, with a bit of digging you should be able to map them.

@sshleifer will know the answer.

1 Like

maybe this will help. This file contains utilities for converting fariseq bart checkpoints to HF format.


Great. Thanks, I will try it. Probably I have to change little bit since the script is for bart

which fairseq arch are you converting?
I would be very excited (and use) a PR that converts other fairseq formats!

1 Like

Hi @sshleifer, as mentioned above I fine tuned mbart.cc25 for machine translation (en-de) with Fairseq. It was actually just for learning purpose, but since it was trained for many hours on multiple gpus, I though it would be good also for other if I put it to huggingface’s models zoo if I am able to convert it.

yes that would be awesome!
I just pushed a script to the convert-mbart branch, let me know if that works.
pr here

If your config is different than mbart-large-en-ro, you should pass it in.

  1. Let me know if that works!
  2. What was your final BLEU/how long did it take to train?

Cool! thanks a lot, I will try it.

Btw, when I run the script I got following error message:

Traceback (most recent call last):
  File "./", line 7, in <module>
    from .convert_bart_original_pytorch_checkpoint_to_pytorch import remove_ignore_keys_
ModuleNotFoundError: No module named '__main__.convert_bart_original_pytorch_checkpoint_to_pytorch'; '__main__' is not a package

After I change from .convert_bart_original_pytorch_checkpoint_to_pytorch import remove_ignore_keys_ to from convert_bart_original_pytorch_checkpoint_to_pytorch import remove_ignore_keys_ (just removing the dot), the script can run


Let us know when you have succeeded! I’m excited about this!

1 Like

Hi, sorry for the late answer. I tried to convert it with this command

python ~/Work/transformers/src/transformers/ --hf_config facebook/mbart-large-cc25 wmt14-en-de/ out

but I get following error message:

    Traceback (most recent call last):
      File "../../transformers/src/transformers/", line 36, in <module>
        model = convert_fairseq_mbart_checkpoint_from_disk(args.fairseq_path, hf_config_path=args.hf_config)
      File "../../transformers/src/transformers/", line 18, in convert_fairseq_mbart_checkpoint_from_disk
      File "/home/wirawan/miniconda3/envs/transformers-cuda9/lib/python3.7/site-packages/torch/nn/modules/", line 847, in load_state_dict
        self.__class__.__name__, "\n\t".join(error_msgs)))
    RuntimeError: Error(s) in loading state_dict for BartModel:
            Unexpected key(s) in state_dict: "decoder.output_projection.weight".

However it did work to convert the original mbart-finetuned_en-ro model from facebook

There was a bug report with the same error message , but I can’t make the solution/work around to work.

I can add “decoder.output_projection.weight” to the ignore_keys in src/transformers/, then the conversion will run without error message and save the model as pytorch_model.bin, but I guess this is not the correct way to solve it?

no that is a good way to solve that. Just make sure that the “decoder.output_projection.weight” param is all zero (it should be).

Unfortunately the “decoder.output_projection.weight” param is not zero at all, as you can see it here:

Probably it is the same as decoder.embed_tokens.weight acording to

Interesting. I’d try renaming that parameter to model.final_logits_bias and then checking that your model is still good after conversion.

Btw, I have a dumb question, how can I translate a text using transformers mbart? I know how to do it with fairseq, but not with transformers :blush:

Have a look at the seq2seq tutorial:

Thanks, I will try it

@sshleifer For testing purpose I converted the fairseqs mbart to transformers mbart where I ignored the decoder.output_projection.weight and uploaded the result to huggigface model hub as “cahya/mbart-large-en-de” (for some reason it doesn’t show up in but I can use/load it in script as pretrained model).
Since I want to know if the converted model works, I created a simple jupyter notebook ( to do translation following @BramVanroy suggestion. Before I tested with my converted mbart model, I tested it with “facebook/mbart-large-en-ro”, and the translation seems to work properly, but it doesn’t work if I use my mbart model.
Does it means that the converted mbart model is incomplete/corrupt since I ignored the “decoder.output_projection.weight”? Or did I miss something in the jupyter notebook? Thanks

I think you need to convert decoder.output_projection.weight -> final_logits_bias