Implimentation of Stopping Criteria List

berkecr · September 15, 2023, 9:37am

In addition to @hatimbr ‘s comment, sometimes the same string may be mapped to different ids by the tokenizer due to preceding tokens.
Example:
In the context of given text,
{
“text”: "\n’pizza’,\n’calzone’,\n’stromboli’,\n’focaccia’,\n’flatbread’,\n’naan’,\n’roti’,\n’paratha’]"
}
last '] maps to tensor([ 525, 29962]) while my given stop sequence '] maps to tensor(2033)

As a workaround,

class StoppingCriteriaSub(StoppingCriteria):
    def __init__(self, stops = [], encounters=1):
        super().__init__()
        self.stops = [stop.to("cuda") for stop in stops]

    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor):
        last_token = input_ids[0][-1]
        for stop in self.stops:
            if tokenizer.decode(stop) == tokenizer.decode(last_token):
                return True
        return False

to use it,

stop_words = ["]", "']", "']\n", "]\n", "\n\n", "']\n\n"]
stop_words_ids = [tokenizer(stop_word, return_tensors='pt', add_special_tokens=False)['input_ids'].squeeze() for stop_word in stop_words]
stopping_criteria = StoppingCriteriaList([StoppingCriteriaSub(stops=stop_words_ids)])

This may be slowing down text generation, so if anyone has better suggestions, I’m eager to listen.

I got Llama 2 to produce a parsable list with this.

Topic		Replies	Views
Stopping Criteria List - OPT model Beginners	1	1921	October 14, 2022
How to stop after generating "###" in transformers? Beginners	0	850	May 3, 2023
How to set stopping criteria in model.generate() when a certain word appears 🤗Transformers	3	3684	February 18, 2024
Generate function and stopping criteria - stop when generated entire word (continue if subtoken merely part of word) Beginners	0	2142	March 3, 2023
Implementing StoppingCriteria for Code Generating Transformers 🤗Transformers	2	2979	January 4, 2024

Implimentation of Stopping Criteria List

Related topics