Multiple responses with async generate in TGI

shaily99 · April 23, 2024, 7:15pm

I am using the generate() function from AsyncClient from Text Generation Inference t query a bunch of models. If I want to generate more than one candidate responses for a given prompt, how do I do that?
I see a n parameter (number of responses to generate) in the chat() function but none in the generate() function.
The naive method I am working with right now is to just repeat the same prompt multiple times, but I am wondering if there is a better way?
On a related note, what is the difference between using these two functions?

Topic		Replies	Views
Default parameters when querying models with TGI Intermediate	0	351	April 23, 2024
TextGeneration Inference Model Inference Endpoints on the Hub	2	448	September 22, 2023
Generate() returns full prompt plus answer 🤗Transformers	1	6330	February 19, 2024
Text generation AI models generating repeated/duplicate text/sentences. What am I doing incorrectly? Hugging face models - Meta GALACTICA 🤗Transformers	1	1123	January 16, 2023
Getting a whole distribution for GPT next token 🤗Transformers	0	373	October 6, 2021

Multiple responses with async generate in TGI

Related topics