Seq2Seq Distillation: train_distilbart_xsum error

@sshleifer
Hi,
I would like to run the method command lines of Direct Knowledge Distillation (KD) which you mentioned at https://github.com/huggingface/transformers/tree/master/examples/seq2seq
" ./train_distilbart_xsum.sh --logger_name wandb --gpus 1"
and meet an error,
“OSError: Model name ‘distilbart_xsum_12_6/student’ was not found in tokenizers model name list (facebook/bart-base, facebook/bart-large, facebook/bart-large-mnli, facebook/bart-large-cnn, facebook/bart-large-xsum, yjernite/bart_eli5). We assumed ‘distilbart_xsum_12_6/student’ was a path, a model identifier, or url to a directory containing vocabulary files named [‘vocab.json’, ‘merges.txt’] but couldn’t find such vocabulary files at this path or url.”
I try to fix it by add the code to distillation.py 46 line that is
“hparams.tokenizer_name = hparams.teacher # Use teacher’s tokenizer”
It seems to fix it but I am not very sure.

yes, your fix is perfect! Wil fix, thanks for reporting this!

Thanks very much!
but this code with another issue:
I cannot run the code with --fp16 --fp16_opt_level=O1, it meets OOM, but I can run it without it. It seems so interesting and confusing.
BTW, a similar issue is reported at https://github.com/huggingface/transformers/issues/8403
Really look forward to your reply
thanks again

1 Like

@sshleifer

Yes, as the issue suggests, the only work around at the moment is use torch 1.5 + Apex.
You should follow that issue for subsequent updates.

Thanks a lot!