I am using the generate()
function from AsyncClient
from Text Generation Inference
t query a bunch of models. If I want to generate more than one candidate responses for a given prompt, how do I do that?
I see a n
parameter (number of responses to generate) in the chat()
function but none in the generate()
function.
The naive method I am working with right now is to just repeat the same prompt multiple times, but I am wondering if there is a better way?
On a related note, what is the difference between using these two functions?