By the way, I think they return more than the required number of responses, because it is easy to handle it by manipulating lists, dictionaries, and strings in Python. If you want to do it, I can show you how to do it.
But I think some models only return answers originally, but I don’t know if there is a setting somewhere. With pipelines, it’s hard to tell where it’s affecting the results…
Maybe you could try a different model once. That would tell us if the program is bad or if the model settings are weird.