I assume a very common use case like Question Answering would ideally only need to output the generated tokens (essentially discarding the prompt tokens). Is there a standard way to achieve this?
I understand we can use the return_full_text=False
parameter in the pipeline
object’s __call__
method to achieve this. Is there a way to do it directly on the generate
method of the GenerateMixin
??