Hello I’m trying to avoid output truncation during text summarization. I’ve been tuning parameters of the .generate() function such as max_length, num_beams, early_stopping…
I found out I could use StoppingCriteriaList class to specify that I want to stop generation when a dot (.) is generated but apparently this is ignored since I sometimes get an output with two sentences (ending in dot). This is my code so far:
from transformers import StoppingCriteriaList def my_stopping_criteria(output): if output!=None: if "." in output["generated_text"]: return True stopping_criteria = StoppingCriteriaList([lambda self, output: my_stopping_criteria(output)]) inputs = tokenizer(texto, return_tensors="pt").input_ids outputs = model.generate(inputs, max_length = 1000, stopping_criteria=stopping_criteria, num_beams = 3, no_repeat_ngram_size=2, early_stopping=False) tokenizer.decode(outputs,skip_special_tokens=True)
In the end I simply want my output not being truncated. Does anyone know how to achieve that while doing text summarization?