SpanBERT, ELECTRA, MARGE from scratch?

HenryAI · January 4, 2022, 4:57pm

Hey everyone! I am incredibly grateful for this tutorial on training a language model from scratch: How to train a new language model from scratch using Transformers and Tokenizers

I really want to expand this to contiguous masking of longer token sequences (e.g. [mask-5], [mask-8]). I have begun looking into how to write a custom DataCollator for this, but suspect I will also need to make some changes to the model as well.

Has anyone looked into this and can point me to any resources?

Thank you!

HenryAI · January 4, 2022, 10:18pm

Found something useful on StackOverflow for this:

HenryAI · January 6, 2022, 2:07pm

This tutorial on Keras Code Examples is the most useful thing I have found so far on this:

merve · January 8, 2022, 9:45pm

Hello Connor Great to see you here! I will not be able to help you, I’m going to ping @nielsr here, maybe he could help. Sorry for delay!

nielsr · January 9, 2022, 12:50pm

Hi,

I’ve seen that SpanBERT models are on the hub, but we haven’t added the model itself yet to the library.

This would be a great project actually:

contribute SpanBERT to HuggingFace Transformers, based on the modeling file. This will be relatively easy, as the authors already used HuggingFace’s implementation of BERT and tweaked it a little bit. The only difference is this class. We could then call the model SpanBertModel in the library, and add a SpanBertForPreTraining similar to BertForPreTraining that includes the heads necessary for pre-training.
add a script to the examples directory, which could be called run_span_mlm.py (similar to run_mlm.py). This can be based on the files defined here (Facebook open-sourced everything!).

If anyone is interested in contributing, let me know!

Aloka · July 22, 2023, 1:35am

@nielsr I am also interested in fine-tuning BERT (or any BERT like pre-trained model) using span masking. Can I know whether this is supported via transformers library. If so can you refer me to any resource available?

Topic		Replies	Views
Doing classification 100% from scratch? 🤗Transformers	4	1721	September 17, 2021
Further pre-train language model in transformers like BERT Models	3	1109	March 27, 2022
Can I train ELECTRA from scratch using hugging face? Models	0	211	January 31, 2024
Machine Translation using Hugging Face problem Intermediate	0	323	May 8, 2023
DistilBert for Self-Supervision - switch heads for pre-training: MaskedLM and SequenceClassification Beginners	0	224	February 16, 2023

SpanBERT, ELECTRA, MARGE from scratch?

Related topics