I have added to the GPT-2 vocabulary two special tokens (
[se]) and use it to generate sequences.
I want to prohibit GPT-2 from generating some words after having generated
[ss] until it generates
[se] . For example in the following sequence
some words of the sequence [ss] here I want some words not to be generated [se] other words of the sequence.
in other words, I want between
[se] to exclude some words from the generation. As @deathcrush answered here it is possible to prohibit GPT-2 from generating some vocabulary ids in general, but is it possible to do it on condition? (only between those tokens
Could you check out this function for contained prefix generation: Models — transformers 4.4.2 documentation
Thank you @patrickvonplaten for your fast reply.
The link you provided is for the models page. Could you point me to the function you are talking about?
@patrickvonplaten I saw that
generate function has a
prefix_allowed_tokens_fn. Were you referring to this?
Yes, that is what he meant.
This function is used to constrain beam search. I want to generate text with sampling and specifically with
top_p. Is there a way to utilise this function?
@patrickvonplaten in the documentation of generate function the description of
prefix_allowed_tokens_fn argument says
If provided, this function constraints the beam search to allowed tokens only at each step.
Can this function be applied with sampling as well?
yes the function can be used in combination with