t5 distillation is very feasible, I just got excited about bart/pegasus since it performed the best in my summarization experiments. There is no feasability issue.
It is much less feasible to distill from t5 -> bart than to distill from a large finetuned t5 checkpoint to a smaller one.
I don’t have any particular task in mind, yet. Just exploring for now.
There is no feasability issue.
I see … thanks for clarifying it.
I just got excited about bart/pegasus since it performed the best in my summarization experiments
Are you suggesting that you got better results with BART, compared with T5?
Re. distilling TPUs: I guess one limitation here is that T5(11B, the teacher) would not fit in many common GPUs; right? I wonder it is possible to pre-extract the teacher logics (say, on a TPU) and just load them in the distiller code. Do you have any thoughts on this issue, @sshleifer?
I wonder it is possible to pre-extract the teacher logics (say, on a TPU) and just load them in the distiller code.
I think this can be achieved with datasets library, we can try to cache the logits along with the examles and while loading example , load its’ corresponding logits as well, so the dataset could return dict which could look something like this
Yes, the bart variants finetuned on cnn and xsum perform better than the t5 variants (that aren’t finetuned.)
They are slightly better than finetuned t5 variants.
I don’t see any reason to use t5-11b, is it better on some task than t5-large?
Note that if you want to use the current distillation code you have to fit teacher, student, and batch_size=1 on a single GPU which is unfortunate.
@valhalla can also just generate teacher pseudolabels (less memory than logits, no new code) which I have run some recent experiments on with good results. I will likely check in code for that soon.
hello, I am trying to run your distillation code with T5. As a POC I am just trying to distill from t5-small to t5-small before I can do actual work. I have a script which looks like the following–
--teacher t5-small --data_dir $CNN_DIR \
--student_decoder_layers 6 --student_encoder_layers 6 \
--learning_rate=3e-4 \
--do_train \
--do_predict \
--fp16 \
--model_name_or_path t5-small \
--val_check_interval 0.1 \
--output_dir distilt5 \
"$@"```
and get the following error-
```Traceback (most recent call last):
File "/home/sumithrab/transformers/src/transformers/configuration_utils.py", line 349, in get_config_dict
resolved_config_file = cached_path(
File "/home/sumithrab/transformers/src/transformers/file_utils.py", line 832, in cached_path
raise EnvironmentError("file {} not found".format(url_or_filename))
OSError: file t5-small/config.json not found
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "distillation.py", line 297, in <module>
distill_main(args)
File "distillation.py", line 287, in distill_main
model = create_module(args)
File "distillation.py", line 254, in create_module
model = module_cls(args)
File "distillation.py", line 42, in __init__
teacher = AutoModelForSeq2SeqLM.from_pretrained(hparams.teacher).eval()
File "/home/sumithrab/transformers/src/transformers/modeling_auto.py", line 1094, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "/home/sumithrab/transformers/src/transformers/configuration_auto.py", line 318, in from_pretrained
config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/sumithrab/transformers/src/transformers/configuration_utils.py", line 368, in get_config_dict
raise EnvironmentError(msg)
OSError: Can't load config for 't5-small'. Make sure that:
- 't5-small' is a correct model identifier listed on 'https://huggingface.co/models'
- or 't5-small' is the correct path to a directory containing a config.json file
Any clue as to what I am missing?
Am I supposed to first download the (pretrained) t5-small model locally and if so from where and to what path, and how do I specify the model in this script?
Traceback (most recent call last):
File "/home/sumithrab/transformers/src/transformers/configuration_utils.py", line 349, in get_config_dict
resolved_config_file = cached_path(
File "/home/sumithrab/transformers/src/transformers/file_utils.py", line 832, in cached_path
raise EnvironmentError("file {} not found".format(url_or_filename))
OSError: file t5-small/config.json not found
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "distillation.py", line 306, in <module>
distill_main(args)
File "distillation.py", line 296, in distill_main
model = create_module(args)
File "distillation.py", line 263, in create_module
model = module_cls(args)
File "/home/sumithrab/transformers/examples/seq2seq/finetune.py", line 63, in __init__
super().__init__(hparams, num_labels=None, mode=self.mode, **kwargs)
File "/home/sumithrab/transformers/examples/lightning_base.py", line 83, in __init__
self.config = AutoConfig.from_pretrained(
File "/home/sumithrab/transformers/src/transformers/configuration_auto.py", line 318, in from_pretrained
config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/sumithrab/transformers/src/transformers/configuration_utils.py", line 368, in get_config_dict
raise EnvironmentError(msg)
OSError: Can't load config for 't5-small'. Make sure that:
- 't5-small' is a correct model identifier listed on 'https://huggingface.co/models'
- or 't5-small' is the correct path to a directory containing a config.json file
If I were in your position, I would try again after rm -rf t5-small
then verify in a python repl that AutoConfig.from_pretrained('t5-small') doesn’t work
Then make a reproducible github issue, including transformers-cli env output.
This is not a distillation issue, and I can’t reproduce it on master.
According to your command line, there’s a --no_teacher option when yes teacher, and close the --no_teacher option when train no-teacher model. Is there any mistake?