Mask More Than one Word:

Here, it says you can mask k tokens. However, in the documentation, it shows you only being able to mask one token. Is it possible to mask k words or am I mistaken?

2 Likes

Are you using a fill-mask pipeline? If so, there’s a hard-coded limit of a single mask in the class, even though the model itself may support multiple masks. I guess the added behavior would warrant some more functionality when it comes to choosing how to sample, e.g. if there are N masked tokens with a few top-k probabilities each one might either want to sample from the join distribution (i.e. ranking the pairs based on p1*p2) or independently. The best approach would depend on the model’s internals, I suppose.

I saw a post a while back welcoming a PR for this matter, so it’s a wanted feature.

I was using the following. However, this code does not work well if you’re aiming to consecutive tokens.

import torch

sentence = "The capital of France <mask> contains the Eiffel <mask>."

token_ids = tokenizer.encode(sentence, return_tensors='pt')

# print(token_ids)

token_ids_tk = tokenizer.tokenize(sentence, return_tensors='pt')

print(token_ids_tk)

masked_position = (token_ids.squeeze() == tokenizer.mask_token_id).nonzero()

masked_pos = [mask.item() for mask in masked_position ]

print (masked_pos)

with torch.no_grad():

    output = model(token_ids)

last_hidden_state = output[0].squeeze()

print ("\n\n")

print ("sentence : ",sentence)

print ("\n")

list_of_list =[]

for mask_index in masked_pos:

    mask_hidden_state = last_hidden_state[mask_index]

    idx = torch.topk(mask_hidden_state, k=100, dim=0)[1]

    words = [tokenizer.decode(i.item()).strip() for i in idx]

    list_of_list.append(words)

    print (words)


best_guess = ""

for j in list_of_list:

    best_guess = best_guess+" "+j[0]

---

What I may try to do is <mask> two consecutive tokens. 

(ex: Paris is <mask> <mask> to visit.)

I'll then re-insert the most probable token for the first <mask>. 

(ex: Paris is a <mask> to visit.)

Then, I'll return the second <mask>'s most probable token. 

(ex: Paris is a city to visit.)

This is the updated code.

import torch

sentence = "The capital of France <mask> <mask> the Eiffel Tower."

token_ids = tokenizer.encode(sentence, return_tensors='pt')

token_ids_tk = tokenizer.tokenize(sentence, return_tensors='pt')

print(token_ids_tk)

masked_position = (token_ids.squeeze() == tokenizer.mask_token_id).nonzero()

masked_pos = [mask.item() for mask in masked_position ]

print (masked_pos)

with torch.no_grad():

    output = model(token_ids)

last_hidden_state = output[0].squeeze()

print ("\n\n")

print ("sentence : ",sentence)

print ("\n")

list_of_list =[]

for mask_index in masked_pos:

    mask_hidden_state = last_hidden_state[mask_index]

    idx = torch.topk(mask_hidden_state, k=5, dim=0)[1]

    words = [tokenizer.decode(i.item()).strip() for i in idx]

    list_of_list.append(words)

    #print(words)

sentences3 = []

for i in range(5):

    New = (list_of_list[0])

    New = New[i]

    New = New.replace("['", "").replace("']", "")

    sentence2 = sentence.replace("<mask>", New, 1)

    sentences3.append(sentence2)

for i in sentences3:

    #print(i)

    token_ids = tokenizer.encode(i, return_tensors='pt')

    token_ids_tk = tokenizer.tokenize(i, return_tensors='pt')

    #print(token_ids_tk)

    masked_position = (token_ids.squeeze() == tokenizer.mask_token_id).nonzero()

    masked_pos = [mask.item() for mask in masked_position ]

    print (masked_pos)

    with torch.no_grad():

        output = model(token_ids)

        last_hidden_state = output[0].squeeze()

        print ("\n\n")

        print ("sentence : ",i)

        print ("\n")

        list_of_list =[]

        for mask_index in masked_pos:

            mask_hidden_state = last_hidden_state[mask_index]

            idx = torch.topk(mask_hidden_state, k=5, dim=0)[1]

            words = [tokenizer.decode(i.item()).strip() for i in idx]

            list_of_list.append(words)

            print(words)

-

sentence :  The capital of France <mask> <mask> the Eiffel Tower.


[6]



sentence :  The capital of France , <mask> the Eiffel Tower.


['and', 'with', 'near', 'at', 'including']
[6]



sentence :  The capital of France is <mask> the Eiffel Tower.


['in', 'now', 'called', 'at', 'under']
[6]



sentence :  The capital of France lies <mask> the Eiffel Tower.


['atop', 'under', 'beneath', 'in', 'behind']
[6]



sentence :  The capital of France stands <mask> the Eiffel Tower.


['atop', 'at', 'on', 'under', 'in']
[6]



sentence :  The capital of France rests <mask> the Eiffel Tower.


['atop', 'on', 'in', 'upon', 'under']

Hey, thanks for the code.
Did you manage to solve it for consecutive masked tokens? I’m having the same problem

Cheers,
Fran

@franfram I had the same issue… Do you happen to know which model supports multiple tokens during inference?

@cnut1648
I did, but it doesn’t work that well (I’m using a Spanish model, maybe an English one will work better).
here’s a collab with the code

hope it helps
cheers

I see, thanks!!! I tried with English one and it seems that roberta for English also doesn’t work well…