Pre-train DistilmByT5Neo

wolosonovich · June 29, 2021, 2:53am

Pre-train DistilmByT5Neo
Let’s combine ByT5, mT5, GPT-Neo and DistilBert!

Model
For initial pre-training, a randomly initialized ByT5 model which we distil after pre-training is completed.

Training Scripts
Training scripts will be created as part of the project

Expected Result
A Distilled model that combines the power of T5 and GPT-Neo while removing the need for tokenization.

UPDATE: We could just go for it and implement this with rotary embeddings as well.

wolosonovich · June 29, 2021, 7:12am

We could kick this up a notch further with rotary embeddings…DistilmRoByT5Neo?

patrickvonplaten · June 30, 2021, 7:36pm

Finalizing since another team member will join @wolosonovich it would be great if you could post the hub name of your team member here once you have it

wolosonovich · June 30, 2021, 7:44pm

I will do that, thank you very much @patrickvonplaten !

wolosonovich · July 1, 2021, 6:27pm

@patrickvonplaten can you add @vmazelis to the spreadsheet. he is a part of our team as well for this project.

patrickvonplaten · July 1, 2021, 10:41pm

should be done

wolosonovich · July 2, 2021, 6:30pm

@patrickvonplaten can you update the spreadsheet and replace Brett with @bneb10 when you have a chance? thanks so much!

wolosonovich · July 2, 2021, 6:58pm

link to our discord server Flax-HuggingFace-Community-Week

Topic		Replies	Views
Pretrain and Fine Tune Byte-level model for multilingual extractive QA (Like ByT5) Flax/JAX Projects	13	1985	July 2, 2021
PreTrain T5 for Italian 🇮🇹 Flax/JAX Projects	3	618	July 7, 2021
Pretrain T5 from scratch in Dutch Flax/JAX Projects	2	2091	July 7, 2021
PreTrain BART on The Pile Flax/JAX Projects	19	1636	July 1, 2021
Example of how to pretrain T5? 🤗Transformers	15	16007	March 16, 2023

Pre-train DistilmByT5Neo

Related topics