Hi, I’m using bad_words_ids to constraint the generation vocab of an MBART model to ensure that it generates text in the desired language. However, one problem that I’m seeing is that the model tends to generate gibberish tokens (punctuations, numbers initially) which resulting in the whole sequence consisting of nothing but punctuations/numbers.
I would like to modify the generation, such that certain bad_words_ids are to be applied before a certain min_length has been reached (initially it must generate word tokens, passing the length threshold it can start generating numbers & punctuations).
How can I implement this in transformers, or is there a functionality available ? I’m willing to implement a custom method but I don’t know where to start. Many thanks in advance !