This thread should be used to ask questions about how examples/seq2seq/distillation.py
works, and to ask questions about the associated paper after it gets released.
1 Like
What is the reasoning behind choosing alternating layers ?
no teacher distillation scores for XSUM ?
no teacher is working for non seq-2-seq task as well as we saw with MNLI, should we also see if it works other tasks as well ?
Alternating layers seems to perform the best by a moderate amount.
Definitely interested to see results for other tasks!
1 Like
relocated to examples/research_projects/seq2seq-distillation/distillation.py
?
Yes, that project is now moved to research_projects
dir.