Implimentation of Stopping Criteria List

Dear HF,

Would someone please show me how to use the stopping criteria.

I would like to stop generation if certain words / phrases are generated e.g. “foo bar”, “moo bar foo”

The instructions seem to use the Bert tokeniser - to generate tokens of the stop sequence?

I am trying to implement this with the OPT model (13b) - would I still use the BERT tokeniser?

Would anyone be able to show an example of using this successfully?

1 Like

Hi!

This is my first reply ever, so please don’t judge too strictly :slight_smile:

I was able to solve a similar task today (for GPT-2), so maybe my suggestion will help you.

So, first things first:

  1. Q: “The instructions seem to use the Bert tokeniser - to generate tokens of the stop sequence?”
    A: in order to stop the sequence the model should know the token that should be used for stopping. So, the tokenizer.encode is used for you to see what token/sequence of tokens will correspond to your word (e.g. “foo bar”) (e.g.:
stop_words_ids = [
    tokenizer.encode(stop_word, add_prefix_space = False) for stop_word in ["foo", "bar"]]

)

  1. Q: “I am trying to implement this with the OPT model (13b) - would I still use the BERT tokeniser?”
    A: No, you should use the tokenizer from your respective model to get the correct tokens. E.G. here are the token values that I’ve got for the words “foo, bar” using a tokenizer from my model which I’m currently training: [[21943], [5657]]. The tokens will be different for different models, that’s why you should use the tokenizer for your model e.g.:
tokenizer = 
AutoTokenizer.from_pretrained("facebook/opt-13b", use_fast=False))
  1. Q: “Would anyone be able to show an example of using this successfully?”

Here is what you should do once you know the token IDs to use for stopping:
a) import required modules: from transformers import StoppingCriteria, StoppingCriteriaList
b) subclass the StoppingCriteria class and add to it a new functionality:

class StoppingCriteriaSub(StoppingCriteria):

    def __init__(self, stops = []):
      StoppingCriteria.__init__(self), 

    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, stops = []):
      self.stops = stops
      for i in range(len(stops)):
        self.stops = self.stops[i]

c) instantiate the class (and pass the tokens which you want to use for stopping as an argument):

stopping_criteria = StoppingCriteriaList([StoppingCriteriaSub(stops = [[21943], [5657]])])

d) finally, pass stopping_criteria as an argument to model.generate:
model.generate(input_ids, do_sample=True, stopping_criteria=stopping_criteria)

I hope, it is helpful

1 Like
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, stops = []):
      self.stops = stops
      for i in range(len(stops)):
        self.stops = self.stops[i]

What is this stopping criteria doing? It should return a True if a token in input_ids occurs in self.stops.

Tried stopping on new lines using


    def __init__(self, stops = []):
      StoppingCriteria.__init__(self), 

    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, stops = []):
      self.stops = stops
      for i in range(len(stops)):
        self.stops = self.stops[i]

stopping_criteria = StoppingCriteriaList([StoppingCriteriaSub(stops = [[13], [198], [0], [30], [11]])])

With GPT2 pre-trained model but still getting the resulting tokens in output multiple times. Generation is using beam search as follows:

output = model.generate(input_ids_batch,
                        early_stopping=True, num_beams=5,
                        temperature=0.7,
                        top_p=0.8,
                        do_sample=True,
                        pad_token_id=50256,
                        stopping_criteria=stopping_criteria,
                        output_scores=True,
                        return_dict_in_generate=True)

But this setup didn’t work for me. Any pointers are appreciated!

I think there is a lot of room for imporvement, but It worked for me.

a) I declared the stop list as follows:

stop_words_ids = [
    tokenizer(stop_word, return_tensors='pt')['input_ids'].squeeze() for stop_word in ["###"]]

b) This class counts how many times the stop token id occurs when generating the text. The "“encounters” has to be adjusted if you are using a prompt that contains samples. Let’s suppose you have two samples in the prompt, then you will need to pass 3 as the “encounters” value (2 for the prompt +1 for the generation) when instantiating the class (see letter c).

class StoppingCriteriaSub(StoppingCriteria):

    def __init__(self, stops = [], encounters=1):
      super().__init__()
      self.stops = stops
      self.ENCOUNTERS = encounters

    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor):
      stop_count = 0
      for stop in self.stops:
        stop_count = (stop == input_ids[0]).sum().item()

      if stop_count >= self.ENCOUNTERS:
          return True
      return False

c) Prompt example:

sample = '''sentence: I love cars.
paraphrase: I like cars.

###

sentence: I love motorcycles.
paraphrase: I like motorcycles.

###
sentence: I love bicycle.
paraphrase: '''

d) This is how I used. Since I have two samples in the prompt, I need to set encounter to 3 (2 of the samples + 1). You can count the total of “###” in the prompt template and add 1 so you don’t hardcode it.

from transformers import StoppingCriteriaList

stopping_criteria = StoppingCriteriaList([StoppingCriteriaSub(stops=stop_words_ids, encounters=3)])
2 Likes

Thank you so much for providing this example including an encounters function. I have been trying to use your code with a list of stop_words. However, I keep getting this error message having to do with the size of the tensors. Do you have any idea what I might be doing wrong?

from transformers import StoppingCriteria, StoppingCriteriaList

stop_words_ids = [
    tokenizer(stop_word, return_tensors='pt')['input_ids'].squeeze() for stop_word in stop_words]

class StoppingCriteriaSub(StoppingCriteria):

    def __init__(self, stops = [], encounters=1):
      super().__init__()
      self.stops = stops
      self.ENCOUNTERS = encounters

    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor):
      stop_count = 0
      for stop in self.stops:
        stop_count = (stop == input_ids[0]).sum().item()

      if stop_count >= self.ENCOUNTERS:
          return True
      return False

stopping_criteria = StoppingCriteriaList([StoppingCriteriaSub(stops=stop_words_ids, encounters=3)])

context = "Las brujas vuelan en una"

input_ids = tokenizer.encode(context, return_tensors='pt')

# generate outputs
generated_outputs = model.generate(input_ids, 
                                   return_dict_in_generate=True, 
                                   output_scores=True, 
                                   num_return_sequences=10, 
                                   num_beams=10,
                                   temperature= 0.1,
                                   max_new_tokens = 10,
                                   stopping_criteria=stopping_criteria)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[39], line 10
      7 print(len(tokenizer.encode(context)))
      9 # generate outputs
---> 10 generated_outputs = model.generate(input_ids, 
     11                                    return_dict_in_generate=True, 
     12                                    output_scores=True, 
     13                                    num_return_sequences=10, 
     14                                    num_beams=10,
     15                                    temperature= 0.1,
     16                                    max_new_tokens = 10,
     17                                    stopping_criteria=stopping_criteria)
     19 gen_sequences = generated_outputs.sequences[:, input_ids.shape[-1]:]
     21 for token in gen_sequences:

File /opt/anaconda3/lib/python3.8/site-packages/torch/autograd/grad_mode.py:27, in _DecoratorContextManager.__call__.<locals>.decorate_context(*args, **kwargs)
     24 @functools.wraps(func)
     25 def decorate_context(*args, **kwargs):
     26     with self.clone():
---> 27         return func(*args, **kwargs)

File /opt/anaconda3/lib/python3.8/site-packages/transformers/generation/utils.py:1474, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, **kwargs)
   1467     input_ids, model_kwargs = self._expand_inputs_for_generation(
   1468         input_ids=input_ids,
   1469         expand_size=generation_config.num_beams,
   1470         is_encoder_decoder=self.config.is_encoder_decoder,
   1471         **model_kwargs,
   1472     )
   1473     # 13. run beam search
-> 1474     return self.beam_search(
   1475         input_ids,
   1476         beam_scorer,
   1477         logits_processor=logits_processor,
   1478         stopping_criteria=stopping_criteria,
   1479         pad_token_id=generation_config.pad_token_id,
   1480         eos_token_id=generation_config.eos_token_id,
   1481         output_scores=generation_config.output_scores,
   1482         return_dict_in_generate=generation_config.return_dict_in_generate,
   1483         synced_gpus=synced_gpus,
   1484         **model_kwargs,
   1485     )
   1487 elif is_beam_sample_gen_mode:
   1488     # 11. prepare logits warper
   1489     logits_warper = self._get_logits_warper(generation_config)

File /opt/anaconda3/lib/python3.8/site-packages/transformers/generation/utils.py:2803, in GenerationMixin.beam_search(self, input_ids, beam_scorer, logits_processor, stopping_criteria, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, **model_kwargs)
   2800 # increase cur_len
   2801 cur_len = cur_len + 1
-> 2803 if beam_scorer.is_done or stopping_criteria(input_ids, scores):
   2804     if not synced_gpus:
   2805         break

File /opt/anaconda3/lib/python3.8/site-packages/transformers/generation/stopping_criteria.py:113, in StoppingCriteriaList.__call__(self, input_ids, scores, **kwargs)
    111 @add_start_docstrings(STOPPING_CRITERIA_INPUTS_DOCSTRING)
    112 def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
--> 113     return any(criteria(input_ids, scores) for criteria in self)

File /opt/anaconda3/lib/python3.8/site-packages/transformers/generation/stopping_criteria.py:113, in <genexpr>(.0)
    111 @add_start_docstrings(STOPPING_CRITERIA_INPUTS_DOCSTRING)
    112 def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
--> 113     return any(criteria(input_ids, scores) for criteria in self)

Cell In[36], line 18, in StoppingCriteriaSub.__call__(self, input_ids, scores)
     16 stop_count = 0
     17 for stop in self.stops:
---> 18   stop_count = (stop == input_ids[0]).sum().item()
     20 if stop_count >= self.ENCOUNTERS:
     21     return True

RuntimeError: The size of tensor a (2) must match the size of tensor b (7) at non-singleton dimension 0

The following code worked for me. Be aware that I didn’t implement the “encounters” parameter and that I send “manually” the stop ids in the gpu (so it can be cleaner):

class StoppingCriteriaSub(StoppingCriteria):

    def __init__(self, stops = [], encounters=1):
        super().__init__()
        self.stops = [stop.to("cuda") for stop in stops]

    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor):
        for stop in self.stops:
            if torch.all((stop == input_ids[0][-len(stop):])).item():
                return True

        return False


stop_words = ["<human>:", "<bot>:"]
stop_words_ids = [tokenizer(stop_word, return_tensors='pt')['input_ids'].squeeze() for stop_word in stop_words]
stopping_criteria = StoppingCriteriaList([StoppingCriteriaSub(stops=stop_words_ids)])
1 Like