I fine tuned facebook’s model mbart.cc25 for machine translation with Fairseq, it saved its model as checkpoint_*.pt. How can I use it now with Transformers, is it possible? Thanks
Unless the naming conventions that are used in transformers are the same as in fairseq, this is not possible out of the box. However, with a bit of digging you should be able to map them.
@sshleifer will know the answer.
maybe this will help. This file contains utilities for converting fariseq bart checkpoints to HF format.
Great. Thanks, I will try it. Probably I have to change little bit since the script is for bart
arch are you converting?
I would be very excited (and use) a PR that converts other fairseq formats!
Hi @sshleifer, as mentioned above I fine tuned mbart.cc25 for machine translation (en-de) with Fairseq. It was actually just for learning purpose, but since it was trained for many hours on multiple gpus, I though it would be good also for other if I put it to huggingface’s models zoo if I am able to convert it.
yes that would be awesome!
I just pushed a script to the
convert-mbart branch, let me know if that works.
If your config is different than
mbart-large-en-ro, you should pass it in.
- Let me know if that works!
- What was your final BLEU/how long did it take to train?
Cool! thanks a lot, I will try it.
Btw, when I run the script I got following error message:
Traceback (most recent call last): File "./convert_mbart_original_checkpoint_to_pytorch.py", line 7, in <module> from .convert_bart_original_pytorch_checkpoint_to_pytorch import remove_ignore_keys_ ModuleNotFoundError: No module named '__main__.convert_bart_original_pytorch_checkpoint_to_pytorch'; '__main__' is not a package
After I change
from .convert_bart_original_pytorch_checkpoint_to_pytorch import remove_ignore_keys_ to
from convert_bart_original_pytorch_checkpoint_to_pytorch import remove_ignore_keys_ (just removing the dot), the script can run
Let us know when you have succeeded! I’m excited about this!
Hi, sorry for the late answer. I tried to convert it with this command
python ~/Work/transformers/src/transformers/convert_mbart_original_checkpoint_to_pytorch.py --hf_config facebook/mbart-large-cc25 wmt14-en-de/checkpoint_best.pt out
but I get following error message:
Traceback (most recent call last): File "../../transformers/src/transformers/convert_mbart_original_checkpoint_to_pytorch.py", line 36, in <module> model = convert_fairseq_mbart_checkpoint_from_disk(args.fairseq_path, hf_config_path=args.hf_config) File "../../transformers/src/transformers/convert_mbart_original_checkpoint_to_pytorch.py", line 18, in convert_fairseq_mbart_checkpoint_from_disk model.model.load_state_dict(state_dict) File "/home/wirawan/miniconda3/envs/transformers-cuda9/lib/python3.7/site-packages/torch/nn/modules/module.py", line 847, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for BartModel: Unexpected key(s) in state_dict: "decoder.output_projection.weight".
However it did work to convert the original mbart-finetuned_en-ro model from facebook https://dl.fbaipublicfiles.com/fairseq/models/mbart/mbart.cc25.ft.enro.tar.gz
There was a bug report with the same error message https://github.com/pytorch/fairseq/issues/2031 , but I can’t make the solution/work around to work.
I can add “decoder.output_projection.weight” to the ignore_keys in src/transformers/convert_bart_original_pytorch_checkpoint_to_pytorch.py, then the conversion will run without error message and save the model as pytorch_model.bin, but I guess this is not the correct way to solve it?
no that is a good way to solve that. Just make sure that the “decoder.output_projection.weight” param is all zero (it should be).
Unfortunately the “decoder.output_projection.weight” param is not zero at all, as you can see it here:
Probably it is the same as decoder.embed_tokens.weight acording to https://github.com/pytorch/fairseq/blob/522c76ba1646cd5ec2cd4be29392f53d40aec50a/fairseq/models/transformer.py#L627
Interesting. I’d try renaming that parameter to
model.final_logits_bias and then checking that your model is still good after conversion.
Btw, I have a dumb question, how can I translate a text using transformers mbart? I know how to do it with fairseq, but not with transformers
Have a look at the seq2seq tutorial: https://github.com/huggingface/transformers/tree/master/examples/seq2seq
Thanks, I will try it
@sshleifer For testing purpose I converted the fairseqs mbart to transformers mbart where I ignored the decoder.output_projection.weight and uploaded the result to huggigface model hub as “cahya/mbart-large-en-de” (for some reason it doesn’t show up in https://huggingface.co/models but I can use/load it in script as pretrained model).
Since I want to know if the converted model works, I created a simple jupyter notebook (https://gist.github.com/cahya-wirawan/0e3eedbcd78c28602dbc554c447aed2a) to do translation following @BramVanroy suggestion. Before I tested with my converted mbart model, I tested it with “facebook/mbart-large-en-ro”, and the translation seems to work properly, but it doesn’t work if I use my mbart model.
Does it means that the converted mbart model is incomplete/corrupt since I ignored the “decoder.output_projection.weight”? Or did I miss something in the jupyter notebook? Thanks
I think you need to convert decoder.output_projection.weight -> final_logits_bias