Pretraining RoBERTa (or another 'flavor') on permuted word orders

andreaparker · June 28, 2021, 6:33pm

For the Community Week project I’d like to extend the work of the McGill researchers who, by inducing ‘order agnostic’ tokens via a ‘permuted’ pre-trained version of BERT, get high task scores on a wide range of benchmark tasks (GLUE, PAWS, etc.) due to the model learning to represent distributional priors rather than its ability to “discover the NLP pipeline”.

I’d like to replicate the authors’ work and perhaps extend it by looking at the ‘permuted model’s performance’ on other benchmarks / sets of NLP and ‘NLU’ tasks for which we have human-curated ‘gold standard’ performance measures.

To start, the authors pre-trained models on various permuted corpora that preserve sentence-level distributional information by randomly shuffling n-grams within the sentence where n is [1,4]. While the authors evaluated the ‘permuted’ models’ performance against ‘normally-trained’ Transformer models in a wide range of settings - GLUE, etc. - I’d be interested to see how the models perform against some of the human-curated benchmark datasets such as PAWS.

patrickvonplaten · June 29, 2021, 2:25pm

Very cool project! I think we should try to keep the scope as small as possible → so maybe:

PreTrain RoBERTa on permuted words
Evaluate “permuted words” pretrained RoBERTa against “real” pretrained RoBERTa on PAWS?

What do you think @andreaparker ?

Anyone who would like to join?

andreaparker · June 30, 2021, 11:04pm

Hi @patrickvonplaten ! / Great to e-meet you!

Thanks for the scoping guidance; your proposed workflow and scope sounds great! I appreciate the assistance in constraining the scope to something a bit more manageable for the Community Week ‘timebox’.

Hopefully other folks will feel free to join in on this project as I’m all about ‘data science as a team sport’ , but I’ll keep an eye on the #flax-jax-community-week Slack channel for any updates regarding “single-person” projects in case permuted-word-order RoBERTa doesn’t pique anyone else’s interest.

patrickvonplaten · July 4, 2021, 11:31am

Finalizing it!

Topic		Replies	Views
PreTrain RoBERTa from scratch in Portuguese Flax/JAX Projects	16	2409	October 4, 2021
BERTIN: PreTrain RoBERTa-large from scratch in Spanish Flax/JAX Projects	23	2003	July 19, 2021
BERT pre-training run_mlm_flax.py questions Beginners	0	253	November 3, 2021
Using BERT and RoBERTa for (causal?) language modeling 🤗Transformers	6	5331	October 2, 2021
Pretrain and Fine Tune Byte-level model for multilingual extractive QA (Like ByT5) Flax/JAX Projects	13	1983	July 2, 2021

Pretraining RoBERTa (or another 'flavor') on permuted word orders

Related topics