Encoding error while fine-tuning

claudia · July 29, 2021, 11:04am

Hello there! I have a question regarding the fine-tuning of mbart. I did the training like the example here transformers/examples/pytorch/translation at v4.6.1 · huggingface/transformers · GitHub and obtained a model pytorch_model.bin
However when trying to use the model to translate I get an UnicodeDecodeError UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte
The complete error is the following. As far as I can see it produces when loading the model I obtained from fine-tuning.

Traceback (most recent call last):
  File "mbart/predict.py", line 41, in <module>
    main()
  File "mbart/predict.py", line 21, in main
    model = MBartForConditionalGeneration.from_pretrained(opt.model)
  File "/home/claudia/anaconda3/envs/speechenv/lib/python3.7/site-packages/transformers/modeling_utils.py", line 1080, in from_pretrained
    **kwargs,
  File "/home/claudia/anaconda3/envs/speechenv/lib/python3.7/site-packages/transformers/configuration_utils.py", line 427, in from_pretrained
    config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/claudia/anaconda3/envs/speechenv/lib/python3.7/site-packages/transformers/configuration_utils.py", line 495, in get_config_dict
    config_dict = cls._dict_from_json_file(resolved_config_file)
  File "/home/claudia/anaconda3/envs/speechenv/lib/python3.7/site-packages/transformers/configuration_utils.py", line 578, in _dict_from_json_>
    text = reader.read()
  File "/home/claudia/anaconda3/envs/speechenv/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte

Any ideas on how to solve this?

Thanks in advance

snowgo · August 8, 2021, 1:03pm

maybe your inputpath is xxxxxxxxxxxxx/pytorch_model.bin
change to xxxxxxxxxxxxx/
have a try

claudia · August 10, 2021, 1:52pm

Thanks, this solved it.

Topic		Replies	Views
Encoding error with fine-tuned model Models	1	823	October 4, 2021
Mbart finetuning Models	0	676	July 29, 2021
UnicodeDecodeError: xprophetnet-large-wiki100-cased-xglue-qg model 🤗Transformers	0	345	June 28, 2021
UnicodeDecodeError with xprophetnet-large-wiki100-cased-xglue-qg model Beginners	1	787	June 29, 2021
Get UnicodeEncodeError while using pipeline for question answering Intermediate	0	472	October 12, 2022

Encoding error while fine-tuning

Related topics