How can I pre-train Bart with our own dataset? It seems that the script examples/language-modeling/run_language_modeling.py doesn’t support it yet. Thanks.
Thanks for the info and the link to the denoising dataset. Maybe @sshleifer can tell us also his experience with BART and transformers?
And how about T5? I see that you have already several T5 models, can we pre-train T5 with our own dataset using transformers?
Adding both of these tasks (T5 and BART pre-training) is in my todo list. Might take some time though.
If you are able to create the span masking code for T5 then you can easily pre-train T5 with Transformers.
Great, I’ll try it