Ignore numbers while generation


We are working on text generation problem (model.generate()). One of the requirement for the text generation is that it shouldn’t contain any number, currency or date.

Is there a way to achieve this through generate method?



One thing that worked well for me is proposing numbers in the prompt, then finding them. E.g., “New ideas for 2022:”
Another thing that worked for other cases is use logit bias. Didn’t try to bias all numbers, but it does work well for other things, so you might as well bias all digits.

Another possible solution is to use bad_words_ids parameter of the model.generate method. This is a way to provide constraints on the possible tokens/phrases in the output. You could use it to specify all token ids which contain numbers.

This looks a lot like Constrained Beam Search, but AFAIK that only lets you specify tokens which you do want in the output, not don’t. @ranish80 suggests to use bad_words_ids parameter of the model.generate method; this makes sense, but I wonder why it isn’t part of the Constraints class.