Don't Stop Pretraining BART

Erpa · December 28, 2020, 8:25pm

Hi, I would like to try the approach suggested in “Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks” (link) for BART. I have my own dataset but there are 2 things that are still unclear to me.

I believe I should start with BartForConditionalGeneration , as that is the LM model. is that right?
Can anyone provide more details on the noising algorithm that was used to train the model? The paper is pretty vague about it, as these are the only details I found

A number of text spans are sampled, with span lengths drawn from a Poisson distribution(λ = 3)

We mask 30% of tokens in each document, and permute all sentences.

valhalla · December 29, 2020, 6:35am

Hi @Erpa

Yes, BartForConditionalGeneration is the LM model.
Currently seq2seq pre-training examples are not available in transformers. FairSeq has the implementation of Bart denoising dataset, so that might help, You can find it here

Topic		Replies	Views
BART pre-training? Beginners	5	1839	August 5, 2023
Pretraining BART for conditional generation 🤗Transformers	1	978	May 30, 2022
Continued (in-domain) Pre-training of BART 🤗Transformers	1	462	September 13, 2023
Train Bart for Conditional Generation (e.g. Summarization) Models	14	17159	November 22, 2023
Question regarding training of BartForConditionalGeneration Models	1	2025	March 2, 2021

Don't Stop Pretraining BART

Related topics