Responses from chat completion is truncated to 100 tokens

PatrikKruse · August 18, 2024, 7:28pm

I am using chat completion with Microsofts SemanticKernel and all my responses are truncated to 100 tokens. (No matter what I have max_new_tokens set to.)

I have two questions, can I raise the bar above 100 tokens and is there a way to programmatically detect that the response are truncated?

Topic		Replies	Views
Text Generation response truncation Beginners	6	1347	August 18, 2024
Meta-Llama-3-8B-Instruct: "max_new_tokens" is not working for /v1/chat/completions Intermediate	1	816	July 2, 2024
Blank Responses Intermediate	1	251	October 5, 2023
How to return more tokens when calling the inference end point? Inference Endpoints on the Hub	4	1491	May 9, 2024
Response from retrieval chain is truncated Beginners	0	426	November 11, 2023

Responses from chat completion is truncated to 100 tokens

Related topics