BART pre-training?

Hi,
How can I pre-train Bart with our own dataset? It seems that the script examples/language-modeling/run_language_modeling.py doesn’t support it yet. Thanks.

1 Like

Hi @cahya, BART pre-training is not yet available in transformers. You can find the denoising dataset here in the fairseq repo, and try to use it;

Thanks for the info and the link to the denoising dataset. Maybe @sshleifer can tell us also his experience with BART and transformers?
And how about T5? I see that you have already several T5 models, can we pre-train T5 with our own dataset using transformers?

Adding both of these tasks (T5 and BART pre-training) is in my todo list. Might take some time though.
If you are able to create the span masking code for T5 then you can easily pre-train T5 with Transformers.

1 Like

Great, I’ll try it

Any updated on this?