Distilbart paper

valhalla · August 9, 2020, 8:46am

Hi @Hildweig, it was fine-tuned the way you fine-tune/train any encoder-decoder Transformer model.
On a high level, the encoder takes the input sequence, creates a hidden representation of it.
The deocoder then receives the encoder representation, and is trained to generate the output sequence auto-regressively using teacher forcing.

You may have already read this, but The Illustrated Transformer by Jay Alammar explains it really well.

all the training hparams are in this file

Topic		Replies	Views
Fine-tuning distiBART Beginners	2	760	October 20, 2020
Seq2Seq Distillation: Methodology Questions Research	7	2745	August 7, 2023
TinyReformer/TinyLongformer details Models	3	432	November 6, 2020
DistilBERT for Donut Decoder 🤗Transformers	0	211	March 29, 2023
How to get better results with DistilGPT2? Intermediate	0	250	April 11, 2023

Distilbart paper

Related topics