Hi everyone.
Is it possible to use the fill mask pipeline with subwords? In spanish pronouns appear sometimes attached to the verbo.
For a given verbo “dile”, i´m trying to mask the pronoun “le” to compare its probability with another pronoun “la”.
I´ve tried things like “di[MASK]”, and targer like [“le”,“la”] and ["##le","##la"], but result is always unknown tokens.
Is it even possible to solve it? I have also tried to mask the whole verb, but since the tokenizer splits it it doesn´t work.
I cannot add every verb+pronoun to the bert vocab, and the pronouns are already there by themselves.
Thank everyone.