Sequence masking

vitvit · April 25, 2022, 12:10pm

Hello,
I’m using a custom tokenizer with a Roberta model.
I want to pre-train the model using the masking task.
Is there a way to mask a span of tokens instead of randomly selecting which to mask?
I know that there is an option for whole word masking collator, but it is incompatible with my tokenizer and I also want more flexibility in the sequence length selection.

Thanks!

Topic		Replies	Views
Costumizing MASKed tokens 🤗Transformers	1	243	September 27, 2023
Sequence Length in Continued Pretraining (MLM) & Masking Strategies Intermediate	0	1180	January 6, 2022
RobertaTokenizer: How to enable masking of custom special tokens 🤗Transformers	1	977	April 24, 2021
[URGENT] Issues with Training RoBERTa Model for Text Prediction with Fill Mask Task 🤗Transformers	6	216	March 19, 2024
Further pre-training the tokenizer? 🤗Tokenizers	0	821	April 30, 2022

Sequence masking

Related topics