Add custom constraint in generate()

ThangLD201 · September 6, 2023, 9:40am

Hi, I’m using bad_words_ids to constraint the generation vocab of an MBART model to ensure that it generates text in the desired language. However, one problem that I’m seeing is that the model tends to generate gibberish tokens (punctuations, numbers initially) which resulting in the whole sequence consisting of nothing but punctuations/numbers.

I would like to modify the generation, such that certain bad_words_ids are to be applied before a certain min_length has been reached (initially it must generate word tokens, passing the length threshold it can start generating numbers & punctuations).

How can I implement this in transformers, or is there a functionality available ? I’m willing to implement a custom method but I don’t know where to start. Many thanks in advance !

Topic		Replies	Views
Generate constraint words within the output sentence and not at its start/end 🤗Transformers	1	1278	September 16, 2022
Constrain generation to a pre-defined vocabulary 🤗Transformers	0	426	September 4, 2023
GPT2: many bad_words_ids leading to slow text generation? Intermediate	0	1541	September 4, 2021
Good word list in generate function 🤗Transformers	1	623	March 23, 2023
Text generation using custom constraints 🤗Transformers	0	694	August 25, 2022

Add custom constraint in generate()

Related topics