I assume a very common use case like Question Answering would ideally only need to output the generated tokens (essentially discarding the prompt tokens). Is there a standard way to achieve this?
I understand we can use the return_full_text=False parameter in the pipeline object’s __call__ method to achieve this. Is there a way to do it directly on the generate method of the GenerateMixin??