Whitelist specific tokens in beam search

Mel · March 12, 2021, 5:05am

I’m using model.generate() for text generation.
I’m wondering is there a way to whitelist specific tokens so that they are returned during the beam search phase.
For example - I want to the “force” the response to contain a question mark or a speccific phrase

100worte · March 12, 2021, 10:39am

Every token is “whitelisted” in the sense that it is considered during beam search. However, you can modify/boost certain tokens you would like to see generated, using the LogitsProcessor. Just implement your own class that boosts your favourite tokens or sequence of tokens. If you are looking for a question mark - maybe you can “blacklist” other end of sentence punctuations in your LogitsProcessor, too?

Mel · March 16, 2021, 11:25am

Thanks @100worte,
I’m new to transformers and I’m still trying to understand how to use the LogitProcessor.
Do I just manually augment the logit scores returned?
How do pass my custom class into model.generate?

100worte · March 16, 2021, 7:13pm

You can pass a LogitsProcessorList to beam_search. For some concrete examples of LogitProcessors, refer to generation_logits_process.py.

xxbidiao · November 11, 2021, 4:54pm

Is it possible to pass custom LogitsProcessor to model.generate()?

Topic		Replies	Views
Use custom LogitsProcessor in `model.generate()` Beginners	2	6790	March 14, 2023
Prevent repeat tokens in GPT2LMHeadModel text generation with max_new_tokens=1 Beginners	0	1115	November 19, 2021
Showing individual token and corresponding score during beam search Beginners	5	3650	November 28, 2023
Custom Decoding Strategy Beginners	0	458	December 6, 2023
Logits from generate and model call different 🤗Transformers	2	924	January 26, 2025

Whitelist specific tokens in beam search

Related topics