PreTrain BART on The Pile

LouisCastricato · June 24, 2021, 6:42pm

Description: We will pretrain a large BART model on The Pile, and measure a performance increase downstream. Potentially we could also add rotary embeddings?

Model: BART (1b+)

Dataset: The Pile

Training scripts: Training scripts will be written as part of the project. Data processing scripts can be taken from GPT-J 6b

Expected result: An adaptable JAX pipeline for training seq2seq models like BART on the pile.

paws · June 25, 2021, 1:00pm

Im Interested in this project, Any reasons why you’d aim for BART and not any model with some architectural improvements?

patrickvonplaten · June 25, 2021, 5:22pm

Great project! I’d also be interested in why one should use BART. Maybe T5 is also an option? We’ll have an official pretraining script merged for T5 very soon → see: [Flax] Add T5 pretraining script by patrickvonplaten · Pull Request #12355 · huggingface/transformers · GitHub

LouisCastricato · June 25, 2021, 6:14pm

To be honest it’s just because I am familiar with BART I haven’t really used T5 in practice yet. A lot of a stack I built last year is still using BART.

valhalla · June 26, 2021, 7:11am

Cool idea!

IMO one important consideration would be the oupt sequence length. AFAIU for BART’s denoiseing objective the output sequence length is same as the input length,where as for T5 it’s quite small, which would lead to faster training.

adding rotary embeddings also seems like a good idea!

Also what do you think about deep encoder-shallow decoder, looking at the ByT5 paper it seems it’s worth exploring.

paws · June 26, 2021, 10:48am

The BART model’s objective seems to help better than T5 though, based on the results for text generation

valhalla · June 28, 2021, 5:19pm

Let’s officially define this project

Putting everybody in the official sheet here . More people can still join! Leave a comment here or on the sheet if you want to change something.

LouisCastricato · June 28, 2021, 5:24pm

@THEODOROS was interested in doing cross project collaboration, he mentioned wanting to use the BART+rotary implementation we end up with.

morgan · June 29, 2021, 10:40am

This is a cool idea, would love to help out if I can! Adding myself to the google sheet (if thats cool)

LouisCastricato · June 30, 2021, 3:27am

Join on the discord @morgan

paws · June 30, 2021, 7:35am

@valhalla @patrickvonplaten We wanted to make some architectural improvements in the BART model, some of it related to using deberta’s tokenizer and also adding rotary embeddings, how would we use this model in HF after that?

valhalla · June 30, 2021, 9:05am

Hi @paws

Feel free to make any improvements you want. You could see how FlaxBart is implemented and try to keep the same API (i.e __call__, save_pretrained and from_pretrained), that way the model will be compatible with HF API. And for now, you could create new repo, we could always modify the code later to make it compatible with HF API.

morgan · June 30, 2021, 10:29am

Thanks @LouisCastricato , where is the discord link? Or do you mean the HF slack? Haven’t received an invite for that yet…

LouisCastricato · June 30, 2021, 3:16pm

You should talk to someone working at HF to get the invite. I do not think I am allowed to send or post the link. @valhalla do you mind helping morgan?

mattbui · June 30, 2021, 4:09pm

@LouisCastricato Hi, I have experience with using BART and T5 for summarization before so I’m interested in this project, hope I can join.

LouisCastricato · June 30, 2021, 4:38pm

Of course! Join the discord as well!

morgan · June 30, 2021, 4:57pm

In, thanks!

patrickvonplaten · July 1, 2021, 10:17am

Added you as well @mattbui

tylerjthomas9 · July 1, 2021, 7:34pm

I have some experience with using BART for summarization on financial text. Very interested in learning more about pre-training seq2seq models + using TPU vms. I would love to join the project if there is still room on the team.

patrickvonplaten · July 1, 2021, 11:50pm

added you

Topic		Replies	Views
BART pre-training? Beginners	5	1839	August 5, 2023
BART generation with shorter input sequences on pre-training task Models	0	309	January 25, 2023
Continued (in-domain) Pre-training of BART 🤗Transformers	1	462	September 13, 2023
Further pretrain BART Models	0	470	December 14, 2021
Is there any pretraining script for BART? 🤗Transformers	0	1217	August 14, 2020

PreTrain BART on The Pile

Related topics