How can I convert a model created with fairseq?

cahya · August 2, 2020, 7:20pm

Hi,
I fine tuned facebook’s model mbart.cc25 for machine translation with Fairseq, it saved its model as checkpoint_*.pt. How can I use it now with Transformers, is it possible? Thanks

BramVanroy · August 2, 2020, 8:45pm

Unless the naming conventions that are used in transformers are the same as in fairseq, this is not possible out of the box. However, with a bit of digging you should be able to map them.

@sshleifer will know the answer.

valhalla · August 3, 2020, 6:02am

maybe this will help. This file contains utilities for converting fariseq bart checkpoints to HF format.

cahya · August 3, 2020, 6:15am

Great. Thanks, I will try it. Probably I have to change little bit since the script is for bart

sshleifer · August 3, 2020, 2:48pm

which fairseq arch are you converting?
I would be very excited (and use) a PR that converts other fairseq formats!

cahya · August 3, 2020, 4:00pm

Hi @sshleifer, as mentioned above I fine tuned mbart.cc25 for machine translation (en-de) with Fairseq. It was actually just for learning purpose, but since it was trained for many hours on multiple gpus, I though it would be good also for other if I put it to huggingface’s models zoo if I am able to convert it.

sshleifer · August 3, 2020, 10:42pm

yes that would be awesome!
I just pushed a script to the convert-mbart branch, let me know if that works.
pr here

If your config is different than mbart-large-en-ro, you should pass it in.

Let me know if that works!
What was your final BLEU/how long did it take to train?

cahya · August 4, 2020, 3:59pm

Cool! thanks a lot, I will try it.

cahya · August 4, 2020, 4:15pm

Btw, when I run the script I got following error message:

Traceback (most recent call last):
  File "./convert_mbart_original_checkpoint_to_pytorch.py", line 7, in <module>
    from .convert_bart_original_pytorch_checkpoint_to_pytorch import remove_ignore_keys_
ModuleNotFoundError: No module named '__main__.convert_bart_original_pytorch_checkpoint_to_pytorch'; '__main__' is not a package

After I change from .convert_bart_original_pytorch_checkpoint_to_pytorch import remove_ignore_keys_ to from convert_bart_original_pytorch_checkpoint_to_pytorch import remove_ignore_keys_ (just removing the dot), the script can run

sshleifer · August 10, 2020, 3:04am

Let us know when you have succeeded! I’m excited about this!

cahya · August 10, 2020, 3:23pm

Hi, sorry for the late answer. I tried to convert it with this command

python ~/Work/transformers/src/transformers/convert_mbart_original_checkpoint_to_pytorch.py --hf_config facebook/mbart-large-cc25 wmt14-en-de/checkpoint_best.pt out

but I get following error message:

    Traceback (most recent call last):
      File "../../transformers/src/transformers/convert_mbart_original_checkpoint_to_pytorch.py", line 36, in <module>
        model = convert_fairseq_mbart_checkpoint_from_disk(args.fairseq_path, hf_config_path=args.hf_config)
      File "../../transformers/src/transformers/convert_mbart_original_checkpoint_to_pytorch.py", line 18, in convert_fairseq_mbart_checkpoint_from_disk
        model.model.load_state_dict(state_dict)
      File "/home/wirawan/miniconda3/envs/transformers-cuda9/lib/python3.7/site-packages/torch/nn/modules/module.py", line 847, in load_state_dict
        self.__class__.__name__, "\n\t".join(error_msgs)))
    RuntimeError: Error(s) in loading state_dict for BartModel:
            Unexpected key(s) in state_dict: "decoder.output_projection.weight".

However it did work to convert the original mbart-finetuned_en-ro model from facebook https://dl.fbaipublicfiles.com/fairseq/models/mbart/mbart.cc25.ft.enro.tar.gz

There was a bug report with the same error message https://github.com/pytorch/fairseq/issues/2031 , but I can’t make the solution/work around to work.

cahya · August 11, 2020, 3:16pm

I can add “decoder.output_projection.weight” to the ignore_keys in src/transformers/convert_bart_original_pytorch_checkpoint_to_pytorch.py, then the conversion will run without error message and save the model as pytorch_model.bin, but I guess this is not the correct way to solve it?

sshleifer · August 11, 2020, 7:02pm

no that is a good way to solve that. Just make sure that the “decoder.output_projection.weight” param is all zero (it should be).

cahya · August 13, 2020, 8:43am

Unfortunately the “decoder.output_projection.weight” param is not zero at all, as you can see it here:

Probably it is the same as decoder.embed_tokens.weight acording to fairseq/transformer.py at 522c76ba1646cd5ec2cd4be29392f53d40aec50a · facebookresearch/fairseq · GitHub

sshleifer · August 13, 2020, 4:43pm

Interesting. I’d try renaming that parameter to model.final_logits_bias and then checking that your model is still good after conversion.

cahya · August 13, 2020, 5:10pm

Btw, I have a dumb question, how can I translate a text using transformers mbart? I know how to do it with fairseq, but not with transformers

BramVanroy · August 15, 2020, 7:41am

Have a look at the seq2seq tutorial: https://github.com/huggingface/transformers/tree/master/examples/seq2seq

cahya · August 15, 2020, 2:21pm

Thanks, I will try it

cahya · August 17, 2020, 6:36pm

@sshleifer For testing purpose I converted the fairseqs mbart to transformers mbart where I ignored the decoder.output_projection.weight and uploaded the result to huggigface model hub as “cahya/mbart-large-en-de” (for some reason it doesn’t show up in https://huggingface.co/models but I can use/load it in script as pretrained model).
Since I want to know if the converted model works, I created a simple jupyter notebook (https://gist.github.com/cahya-wirawan/0e3eedbcd78c28602dbc554c447aed2a) to do translation following @BramVanroy suggestion. Before I tested with my converted mbart model, I tested it with “facebook/mbart-large-en-ro”, and the translation seems to work properly, but it doesn’t work if I use my mbart model.
Does it means that the converted mbart model is incomplete/corrupt since I ignored the “decoder.output_projection.weight”? Or did I miss something in the jupyter notebook? Thanks

sshleifer · August 17, 2020, 6:48pm

I think you need to convert decoder.output_projection.weight -> final_logits_bias

Topic		Replies	Views
Cannot convert mbart from fairseq to huggingface using the script in the repo 🤗Transformers	3	1253	February 8, 2022
[new model] FSMT has been released + 9 models ported 🤗Transformers	3	1146	September 25, 2020
How to convert Fairseq model to huggingface transformer model Beginners	1	741	October 31, 2023
Convert pytorch model to wav2vec2 original pytorch checkpoint Models	0	417	March 15, 2023
Converting pytorch checkpoints to original roberta pytorch checkpoints 🤗Transformers	0	653	November 4, 2020

How can I convert a model created with fairseq?

Related topics