Fill mask with subwords

ockapuh · June 6, 2021, 7:09am

Hi everyone.

Is it possible to use the fill mask pipeline with subwords? In spanish pronouns appear sometimes attached to the verbo.

For a given verbo “dile”, i´m trying to mask the pronoun “le” to compare its probability with another pronoun “la”.

I´ve tried things like “di[MASK]”, and targer like [“le”,“la”] and ["##le","##la"], but result is always unknown tokens.

Is it even possible to solve it? I have also tried to mask the whole verb, but since the tokenizer splits it it doesn´t work.

I cannot add every verb+pronoun to the bert vocab, and the pronouns are already there by themselves.

Thank everyone.

Topic		Replies	Views
About fill-mask pipeline with [mask] made up of multiple tokens 🤗Transformers	0	323	April 24, 2023
Retrieving whole words with fill-mask pipeline Beginners	1	401	November 19, 2021
Sequence masking 🤗Transformers	0	379	April 25, 2022
How does FillMaskPipeline work with Subword-Tokenization? 🤗Transformers	1	426	April 6, 2022
Mask modelling on specific words Beginners	1	1044	March 25, 2021