Does TGI Model (GitHub - huggingface/text-generation-inference: Large Language Model Text Generation Inference) has any metric or api to know time taken to generate first token?
Also, I receive below error when my parameter sends stream=true - Does decoder_input_detail works only with stream = false?
decoder_input_details
== true is not supported when streaming tokens"