@sshleifer
Hi,
I would like to run the method command lines of Direct Knowledge Distillation (KD) which you mentioned at https://github.com/huggingface/transformers/tree/master/examples/seq2seq
" ./train_distilbart_xsum.sh --logger_name wandb --gpus 1"
and meet an error,
âOSError: Model name âdistilbart_xsum_12_6/studentâ was not found in tokenizers model name list (facebook/bart-base, facebook/bart-large, facebook/bart-large-mnli, facebook/bart-large-cnn, facebook/bart-large-xsum, yjernite/bart_eli5). We assumed âdistilbart_xsum_12_6/studentâ was a path, a model identifier, or url to a directory containing vocabulary files named [âvocab.jsonâ, âmerges.txtâ] but couldnât find such vocabulary files at this path or url.â
I try to fix it by add the code to distillation.py 46 line that is
âhparams.tokenizer_name = hparams.teacher # Use teacherâs tokenizerâ
It seems to fix it but I am not very sure.
yes, your fix is perfect! Wil fix, thanks for reporting this!
Thanks very much!
but this code with another issue:
I cannot run the code with --fp16 --fp16_opt_level=O1, it meets OOM, but I can run it without it. It seems so interesting and confusing.
BTW, a similar issue is reported at https://github.com/huggingface/transformers/issues/8403
Really look forward to your reply
thanks again
1 Like
Yes, as the issue suggests, the only work around at the moment is use torch 1.5 + Apex.
You should follow that issue for subsequent updates.
Thanks a lot!