How can I convert a model created with fairseq?

Do you mean to assign decoder.output_projection.weight to model. final_logits_bias like:
model.final_logits_bias = state_dict["decoder.output_projection.weight"] ? but their shape are different ([1, 250027] and [250027, 1024]), or did I misunderstand it?

I misunderstood the shape. If you send me a link to your full model dir I can try to get it working. Might need to change modeling_bart.py.

Hi, here is the link to the model directory: https://easyupload.io/8881kn
It’s almost 10GB uncompressed. I don’t know why fairseq created such a huge model, maybe some weights are duplicated. Thanks a lot that you want to have a look on it.

I think fairseq also saves optimizer, scheduler state in the checkpoint, which is why the larger size.

2 Likes

Hi,
Is there yet a solution for this problem, @sshleifer? I am facing the same issues when trying to convert a fine-tuned BART checkpoint to huggingface.

Thank you!

You could try the fsmt converter written by @stas .

The only difference between bart and fsmt (as far as I know) is that FSMT allows separate decoder embeddings.

Good work, I was able to reproduce the results with the FSMT conversion (FSMT tutorial). I am not quite sure which parameters to change in order to apply the conversion on BART. Any helpful suggestions here, @stas?
Highly appreciated!

Check what BART configuration is and adjust what’s different. FSMT also has a slightly different architecture - some layers were added, some removed. vocab sizes can be different too, so some config args are different too. So the result definitely won’t be the same as what transformers's bart expects.

Probably the easiest way to proceed is for you to start and then ask specifically if something isn’t working right.

p.s. but there is also the bart conversion script, no? Won’t that do the trick?

nvm sloved by prepending one row vector for padding id.

Hi! I’m trying to convert a fairseq trained bart model to huggingface too. I was able to load the weights but when I try to generate sequences using the hugging face model, the first token is ignored in translation for some reason. Does anywone have any thoughts?

Hi, I am getting this error while converting the model. size mismatch for decoder.output_projection.weight: copying a param with shape torch.Size([250027, 1024]) from checkpoint, the shape in current model is torch.Size([50264, 1024]). Any ideas, how it can be solved?
I am using --hf_config facebook/mbart-large-cc25

Hi @Anas12091101
I think you need to change config.vocab_size to 250027, can you share more details on how you are running your script?

hi, @ybelkada, Thanks for your reply. I copied the convert_bart_original_pytorch_checkpoint_to_pytorch.py script and pasted in a convert.py file. I finetuned mbart-cc25 model and stored the checkpoint at /datadrive/checkpoint/checkpoint_best.pt. Now to convert the model I am using this command python convert.py /datadrive/checkpoint/checkpoint_best.pt ~/ --hf_config facebook/mbart-large-cc25 and getting this error: `Traceback (most recent call last):
File “convert.py”, line 137, in
convert_bart_checkpoint(args.fairseq_path, args.pytorch_dump_folder_path, hf_checkpoint_name=args.hf_config)
File “/home/anassmohammad19/nlp/lib/python3.7/site-packages/torch/autograd/grad_mode.py”, line 27, in decorate_context
return func(*args, **kwargs)
File “convert.py”, line 80, in convert_bart_checkpoint
bart = load_xsum_checkpoint(checkpoint_path)
File “convert.py”, line 61, in load_xsum_checkpoint
hub_interface.model.load_state_dict(sd[“model”],strict=False)
File “/home/anassmohammad19/nlp/lib/python3.7/site-packages/fairseq/models/fairseq_model.py”, line 128, in load_state_dict
return super().load_state_dict(new_state_dict, strict)
File “/home/anassmohammad19/nlp/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 1672, in load_state_dict
self.class.name, “\n\t”.join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for BARTModel:
size mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([250027, 1024]) from checkpoint, the shape in current model is torch.Size([50264, 1024]).
size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([250027, 1024]) from checkpoint, the shape in current model is torch.Size([50264, 1024]).
size mismatch for decoder.output_projection.weight: copying a param with shape torch.Size([250027, 1024]) from checkpoint, the shape in current model is torch.Size([50264, 1024]).

`