Now, consider that 1 is equivalent to the [MASK] token. I want to run a RobertaForMaskedLM model to predict alternatives to only the masked tokens, ie. the tokens having entry = 1. A standard model(**inputs) where inputs is of the form {'input_ids': torch.tensor(...), 'attention_mask': torch.tensor(...), 'labels': torch.tensor(...)}
How can I go about achieving this?
If you train with MaskedLM, set label only [Mask] token. roberta docs
labels (torch.LongTensor of shape (batch_size, sequence_length), optional) — Labels for computing the masked language modeling loss. Indices should be in [-100, 0, ..., config.vocab_size] (see input_ids docstring) Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, ..., config.vocab_size]
You need to set input text to input encodings.
Let me show some example.
there some origin input set. (it is just set by random vocab id.)
[a,b,c,d,e]
[12,22,55,465,44]
and make some token masked.(mask token id is 1, you can set at tokenizer’s special token mapping.)
[a,[Mask],b,c,d,[Mask]]
[12,1,55,465,1]
then make label to [Mask] token will predict original token, other input will be ignored.
[[UNK],b,[UNK],[UNK],[UNK],e]
[-100, 22, -100, -100, 44]
Mask to avoid performing attention on padding token indices.
if your input will padding with max_size =6 input and attention_mask will be like this. (pad_token id = 3)
[a,[Mask],b,c,d,[Mask],[PAD]]
[12,1,55,465,1,3]
attention_mask will [1,0], 1 = not pad token, 0 = pad token.