Prohibit GPT-2 from generating some words on a condition

hfnlpmb · March 21, 2021, 1:12pm

Hello,

I have added to the GPT-2 vocabulary two special tokens ([ss], [se]) and use it to generate sequences.
I want to prohibit GPT-2 from generating some words after having generated [ss] until it generates [se] . For example in the following sequence

some words of the sequence [ss] here I want some words not to be generated [se] other words of the sequence.

in other words, I want between [ss] … [se] to exclude some words from the generation. As @deathcrush answered here it is possible to prohibit GPT-2 from generating some vocabulary ids in general, but is it possible to do it on condition? (only between those tokens [ss] and [se])

patrickvonplaten · March 21, 2021, 2:04pm

Could you check out this function for contained prefix generation: Models — transformers 4.4.2 documentation

hfnlpmb · March 21, 2021, 4:03pm

Thank you @patrickvonplaten for your fast reply.

The link you provided is for the models page. Could you point me to the function you are talking about?

hfnlpmb · March 22, 2021, 7:38pm

@patrickvonplaten I saw that generate function has a prefix_allowed_tokens_fn. Were you referring to this?

prajjwal1 · March 23, 2021, 2:44am

Yes, that is what he meant.

hfnlpmb · March 23, 2021, 9:06pm

This function is used to constrain beam search. I want to generate text with sampling and specifically with top_p. Is there a way to utilise this function?

hfnlpmb · March 29, 2021, 5:47pm

@patrickvonplaten in the documentation of generate function the description of prefix_allowed_tokens_fn argument says

If provided, this function constraints the beam search to allowed tokens only at each step.

Can this function be applied with sampling as well?

patrickvonplaten · April 25, 2021, 6:03pm

Hey @hfnlpmb,

yes the function can be used in combination with do_sample=True

Topic		Replies	Views
Exclude words from GPT-2 generate( ) 🤗Transformers	3	1753	April 26, 2023
Get vocabulary tokens in order to exclude them from generate function 🤗Tokenizers	2	2648	August 1, 2022
Generate constraint words within the output sentence and not at its start/end 🤗Transformers	1	1277	September 16, 2022
Example of prefix_allowed_tokens_fn() while text generation 🤗Transformers	2	5732	July 21, 2022
Text generation using custom constraints 🤗Transformers	0	694	August 25, 2022

Prohibit GPT-2 from generating some words on a condition

Related topics