How to provide "negative topics" for OPTForCausalLM?

davidbernat · November 26, 2022, 4:08pm

This HuggingFace blog article is a very useful introduction to configuring the various model.generate() methods for generating text. Most of these models accept a no_repeat_ngram_size variable which specifies that the generated text may generate an ngram after that ngram has already been generated. This removes the problem of generative models repeating large swathes of document text.

However, it does imply that two improvements can be made to all generative models.

a list of ‘negative topic’ ngrams could be provided to model.generate() to disable the generation of those ngrams before ever seeing them. For instance, if you want to generate an article on New York City, but do not want the article subject to be Covid-19, you could pass in a list of a few dozen n-grams on the topic of Covid-19.
a list of ‘approved repeats’ ngrams could also be provided. As stated in the blog, one of the trade-offs of the no_repeat_ngram_size approach is that documents which leverage long sequences of ngrams (such as the phrase “New York City”) will only ever use that phrase once if no_repeat_ngram_size < 3, despite the fact that an article on New York City would be expected to use it dozens of time.

The architecture of the OPTForCausalLM implies that the architecture of the methods ought to handle this for all models. This makes this post a feature request.

Is there an appropriate GitHub to discuss such feature requests and/or bugs?

Can anybody add to this blog post for introductory resources on text generation?

Thank you. David

Topic		Replies	Views
Does run_generation.py has a "no_repeat_ngram_size" attribute? Beginners	4	2966	September 4, 2020
Narrative text generation Research	2	79	June 13, 2025
Text generation AI models generating repeated/duplicate text/sentences. What am I doing incorrectly? Hugging face models - Meta GALACTICA 🤗Transformers	1	1115	January 16, 2023
Good word list in generate function 🤗Transformers	1	623	March 23, 2023
Generate unlikely text Beginners	2	617	September 17, 2022

How to provide "negative topics" for OPTForCausalLM?

Related topics