TextGeneration Inference Model


Is there a way to calculate time taken to first token generating using TGI Model? Does TGI gives this metric by default in the generated response output or some other way to capture this time?

This is the prefill time that you can see either in the prom metrics or text-generation-benchmark. Cheers.

What is text-generation-benchmark ? Is it separate repo?