Generate function and stopping criteria - stop when generated entire word (continue if subtoken merely part of word)

I am using the generate function to generate several possible continuations of a sentence context, including their probabilities. It is working ok, but I have some problems when words are made up of more than one token.

Since some generated tokens only constitute sub-parts of words, I need a way of only generating the output up to a word boundary. I am thinking that I should be able to solve this with a stopping_criteria but I cannot figure out how to implement this.

Below is an example:

import torch
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM

model_name = 'PlanTL-GOB-ES/gpt2-large-bne'

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, pad_token_id=tokenizer.eos_token_id)

context =  "Las brujas vuelan en una" # witches fly on a
input_ids = tokenizer.encode(context, return_tensors='pt')

bad_list = tokenizer([' ', ',', '.', '..', '...', '....', '.....', ':', ';', '"', '"', '?', '!',  '/', '-', 
                      '(', ')', '()', "'", ' ', ']', '['], 

outputs = model.generate(input_ids, 
                       temperature= 0.1,
                       max_new_tokens= 3,
                       bad_words_ids = bad_list.input_ids)

gen_sequences = outputs.sequences[:, input_ids.shape[-1]:]
token_list = gen_sequences.numpy().tolist()[0:]
for token in token_list:
    print(token, tokenizer.decode(token))

The input sentence is “Witches fly on a” and the generated outputs are

[749, 13750, 313]  escoba de
[749, 13750, 342]  escoba y
[749, 13750, 1192]  escoba vol
[16234, 313, 8326]  bola de cristal
[749, 13750, 341]  escoba que
[749, 13750, 21127]  escoba mágica
[313, 387, 37835]  de las carrozas
[11568, 603, 313]  carroza de
[34999, 1625, 313]  avioneta de
[749, 13750, 350]  escoba con

here, the word “escoba” (broom) consists of two subtokens, 749 and 13750 whereas the word “bola” (ball) merely consists of on token, 16234. I want some way of telling the generate function to generate function up till and including - but no more than - tokens constituting one word.

Is this possible?