Seq2Seq Distillation: train_distilbart_xsum error

anon18297155 · November 7, 2020, 1:56pm

@sshleifer
Hi,
I would like to run the method command lines of Direct Knowledge Distillation (KD) which you mentioned at https://github.com/huggingface/transformers/tree/master/examples/seq2seq
" ./train_distilbart_xsum.sh --logger_name wandb --gpus 1"
and meet an error,
“OSError: Model name ‘distilbart_xsum_12_6/student’ was not found in tokenizers model name list (facebook/bart-base, facebook/bart-large, facebook/bart-large-mnli, facebook/bart-large-cnn, facebook/bart-large-xsum, yjernite/bart_eli5). We assumed ‘distilbart_xsum_12_6/student’ was a path, a model identifier, or url to a directory containing vocabulary files named [‘vocab.json’, ‘merges.txt’] but couldn’t find such vocabulary files at this path or url.”
I try to fix it by add the code to distillation.py 46 line that is
“hparams.tokenizer_name = hparams.teacher # Use teacher’s tokenizer”
It seems to fix it but I am not very sure.

sshleifer · November 8, 2020, 9:53pm

yes, your fix is perfect! Wil fix, thanks for reporting this!

anon18297155 · November 9, 2020, 3:27am

Thanks very much!
but this code with another issue:
I cannot run the code with --fp16 --fp16_opt_level=O1, it meets OOM, but I can run it without it. It seems so interesting and confusing.
BTW, a similar issue is reported at https://github.com/huggingface/transformers/issues/8403
Really look forward to your reply
thanks again

anon18297155 · November 10, 2020, 6:50am

@sshleifer

sshleifer · November 10, 2020, 2:08pm

Yes, as the issue suggests, the only work around at the moment is use torch 1.5 + Apex.
You should follow that issue for subsequent updates.

anon18297155 · November 10, 2020, 2:19pm

Thanks a lot!

Topic		Replies	Views
Tokenizer for 'sshleifer/distilbart-xsum-12-6'? Beginners	2	304	August 18, 2020
Issues running seq2seq distillation 🤗Transformers	4	862	January 11, 2021
Strange sequence generation with xsum-distillbart (clumped tokens) Models	0	296	February 28, 2022
Trainer errors out when concatenating different sequence length batches with distributed training and IterableDataset 🤗Transformers	0	204	October 2, 2023
Fine-tune a model on translation: Beginners	6	1862	July 17, 2021

Seq2Seq Distillation: train_distilbart_xsum error

Related topics