BART pre-training?

cahya · July 31, 2020, 8:59pm

Hi,
How can I pre-train Bart with our own dataset? It seems that the script examples/language-modeling/run_language_modeling.py doesn’t support it yet. Thanks.

valhalla · August 1, 2020, 4:38am

Hi @cahya, BART pre-training is not yet available in transformers. You can find the denoising dataset here in the fairseq repo, and try to use it;

cahya · August 1, 2020, 12:23pm

Thanks for the info and the link to the denoising dataset. Maybe @sshleifer can tell us also his experience with BART and transformers?
And how about T5? I see that you have already several T5 models, can we pre-train T5 with our own dataset using transformers?

valhalla · August 1, 2020, 2:06pm

Adding both of these tasks (T5 and BART pre-training) is in my todo list. Might take some time though.
If you are able to create the span masking code for T5 then you can easily pre-train T5 with Transformers.

cahya · August 1, 2020, 2:27pm

Great, I’ll try it

Arielkanevsky · August 5, 2023, 4:59am

Any updated on this?

Topic		Replies	Views
Continued (in-domain) Pre-training of BART 🤗Transformers	1	462	September 13, 2023
Don't Stop Pretraining BART Research	1	906	December 29, 2020
PreTrain BART on The Pile Flax/JAX Projects	19	1636	July 1, 2021
Is there any pretraining script for BART? 🤗Transformers	0	1217	August 14, 2020
[Beginner] fine-tune Bart with custom dataset in other language? Beginners	2	3230	January 22, 2021

BART pre-training?

Related topics