How can I convert a model created with fairseq?

cahya · August 17, 2020, 8:11pm

Do you mean to assign decoder.output_projection.weight to model. final_logits_bias like:
model.final_logits_bias = state_dict["decoder.output_projection.weight"] ? but their shape are different ([1, 250027] and [250027, 1024]), or did I misunderstand it?

sshleifer · August 17, 2020, 10:22pm

I misunderstood the shape. If you send me a link to your full model dir I can try to get it working. Might need to change modeling_bart.py.

cahya · August 18, 2020, 8:02pm

Hi, here is the link to the model directory: https://easyupload.io/8881kn
It’s almost 10GB uncompressed. I don’t know why fairseq created such a huge model, maybe some weights are duplicated. Thanks a lot that you want to have a look on it.

valhalla · August 19, 2020, 9:12am

I think fairseq also saves optimizer, scheduler state in the checkpoint, which is why the larger size.

timo · October 24, 2020, 6:35pm

Hi,
Is there yet a solution for this problem, @sshleifer? I am facing the same issues when trying to convert a fine-tuned BART checkpoint to huggingface.

Thank you!

sshleifer · October 26, 2020, 3:04am

You could try the fsmt converter written by @stas .

The only difference between bart and fsmt (as far as I know) is that FSMT allows separate decoder embeddings.

timo · October 28, 2020, 9:06pm

Good work, I was able to reproduce the results with the FSMT conversion (FSMT tutorial). I am not quite sure which parameters to change in order to apply the conversion on BART. Any helpful suggestions here, @stas?
Highly appreciated!

stas · October 28, 2020, 9:27pm

Check what BART configuration is and adjust what’s different. FSMT also has a slightly different architecture - some layers were added, some removed. vocab sizes can be different too, so some config args are different too. So the result definitely won’t be the same as what transformers's bart expects.

Probably the easiest way to proceed is for you to start and then ask specifically if something isn’t working right.

p.s. but there is also the bart conversion script, no? Won’t that do the trick?

ddeer · August 2, 2021, 4:58pm

nvm sloved by prepending one row vector for padding id.

firstbanana · May 19, 2022, 7:27pm

Hi! I’m trying to convert a fairseq trained bart model to huggingface too. I was able to load the weights but when I try to generate sequences using the hugging face model, the first token is ignored in translation for some reason. Does anywone have any thoughts?

Anas12091101 · January 21, 2023, 11:36am

Hi, I am getting this error while converting the model. size mismatch for decoder.output_projection.weight: copying a param with shape torch.Size([250027, 1024]) from checkpoint, the shape in current model is torch.Size([50264, 1024]). Any ideas, how it can be solved?
I am using --hf_config facebook/mbart-large-cc25

ybelkada · January 21, 2023, 1:44pm

Hi @Anas12091101
I think you need to change config.vocab_size to 250027, can you share more details on how you are running your script?

Anas12091101 · January 21, 2023, 2:08pm

hi, @ybelkada, Thanks for your reply. I copied the convert_bart_original_pytorch_checkpoint_to_pytorch.py script and pasted in a convert.py file. I finetuned mbart-cc25 model and stored the checkpoint at /datadrive/checkpoint/checkpoint_best.pt. Now to convert the model I am using this command python convert.py /datadrive/checkpoint/checkpoint_best.pt ~/ --hf_config facebook/mbart-large-cc25 and getting this error: `Traceback (most recent call last):
File “convert.py”, line 137, in
convert_bart_checkpoint(args.fairseq_path, args.pytorch_dump_folder_path, hf_checkpoint_name=args.hf_config)
File “/home/anassmohammad19/nlp/lib/python3.7/site-packages/torch/autograd/grad_mode.py”, line 27, in decorate_context
return func(*args, **kwargs)
File “convert.py”, line 80, in convert_bart_checkpoint
bart = load_xsum_checkpoint(checkpoint_path)
File “convert.py”, line 61, in load_xsum_checkpoint
hub_interface.model.load_state_dict(sd[“model”],strict=False)
File “/home/anassmohammad19/nlp/lib/python3.7/site-packages/fairseq/models/fairseq_model.py”, line 128, in load_state_dict
return super().load_state_dict(new_state_dict, strict)
File “/home/anassmohammad19/nlp/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 1672, in load_state_dict
self.class.name, “\n\t”.join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for BARTModel:
size mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([250027, 1024]) from checkpoint, the shape in current model is torch.Size([50264, 1024]).
size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([250027, 1024]) from checkpoint, the shape in current model is torch.Size([50264, 1024]).
size mismatch for decoder.output_projection.weight: copying a param with shape torch.Size([250027, 1024]) from checkpoint, the shape in current model is torch.Size([50264, 1024]).

`

Topic		Replies	Views
Cannot convert mbart from fairseq to huggingface using the script in the repo 🤗Transformers	3	1253	February 8, 2022
[new model] FSMT has been released + 9 models ported 🤗Transformers	3	1147	September 25, 2020
How to convert Fairseq model to huggingface transformer model Beginners	1	743	October 31, 2023
Convert pytorch model to wav2vec2 original pytorch checkpoint Models	0	417	March 15, 2023
Converting pytorch checkpoints to original roberta pytorch checkpoints 🤗Transformers	0	653	November 4, 2020

How can I convert a model created with fairseq?

Related topics