How to make a model predict on only some tokens

anon81579828 · June 13, 2022, 11:09pm

Consider the following simplified example:

input = tensor([1, 2, 3, 1, 2, 5, 1, 4, 7])

Now, consider that 1 is equivalent to the [MASK] token. I want to run a RobertaForMaskedLM model to predict alternatives to only the masked tokens, ie. the tokens having entry = 1. A standard model(**inputs) where inputs is of the form {'input_ids': torch.tensor(...), 'attention_mask': torch.tensor(...), 'labels': torch.tensor(...)}
How can I go about achieving this?

cog · June 16, 2022, 1:52am

hi

If you train with MaskedLM, set label only [Mask] token.
roberta docs

labels (torch.LongTensor of shape (batch_size, sequence_length), optional) — Labels for computing the masked language modeling loss. Indices should be in [-100, 0, ..., config.vocab_size] (see input_ids docstring) Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, ..., config.vocab_size]

You need to set input text to input encodings.
Let me show some example.

there some origin input set. (it is just set by random vocab id.)

[a,b,c,d,e]
[12,22,55,465,44]

and make some token masked.(mask token id is 1, you can set at tokenizer’s special token mapping.)

[a,[Mask],b,c,d,[Mask]]
[12,1,55,465,1]

then make label to [Mask] token will predict original token, other input will be ignored.

[[UNK],b,[UNK],[UNK],[UNK],e]
[-100, 22, -100, -100, 44]

Mask to avoid performing attention on padding token indices.
if your input will padding with max_size =6 input and attention_mask will be like this. (pad_token id = 3)

[a,[Mask],b,c,d,[Mask],[PAD]]
[12,1,55,465,1,3]

attention_mask will [1,0], 1 = not pad token, 0 = pad token.

[1,1,1,1,1,0]

so, you can make encoding like this.

input id
[12, 1, 55, 465, 1, 0] #([a,[Mask], c, d, [Mask], [PAD]])
attention_mask
[1, 1, 1, 1, 1, 0]
label
[-100, 22, -100, -100, 44, -100] #([[UNK], b, [UNK], [UNK], e, [UNK]])

Also, you don’t forget [CLS], [SEP] token.

here is huggingface official docs.
there are RobertaMaskedLM’s Input parameter at
Roberta_MaskedLM input parameters

Hope to help.

regards.

Topic		Replies	Views
Selective masking in Language modeling Beginners	1	2163	August 13, 2020
[URGENT] Issues with Training RoBERTa Model for Text Prediction with Fill Mask Task 🤗Transformers	6	216	March 19, 2024
BertForMaskedLM training from scratch 🤗Transformers	0	1044	April 7, 2023
Costumizing MASKed tokens 🤗Transformers	1	243	September 27, 2023
Further pre-training the tokenizer? 🤗Tokenizers	0	821	April 30, 2022

How to make a model predict on only some tokens

Related topics