It seems like you know a lot about how this works. So, if setting tokenizer.pad_token = tokenizer.eos_token
causes falcon to infinitely generate text up to the cutoff point, how do you stop this from happening? Do you have time to provide a code snippet? All I can think of is:
raw_pad_token = “<pad>”
processed_token = tokenizer(raw_pad_token)
tokenizer.pad_token = processed_token
But based on this thread, this isn’t enough to work