Text Generation response truncation

cCaldwell · September 1, 2023, 2:46pm

I am having an issue where all responses are truncated, no matter which model or where they are hosted. Other questions on this topic have referenced required changes in langchain modules, but we are not using langchain.

For reference, I am getting the same error from:
databricks/dolly-v2-3b run on databricks ML RT 13.3 (56 ram 8cpu GCP)
databricks/dolly-v2-3b run on HF free inference using huggingface_hub library
databricks/dolly-v2-7b huggingface inference endpoint Nvidia A10G using recommended sample input
tiiuae/falcon-7b huggingface inference endpoint Nvidia A10G using recommended sample input

In case of HF inference endpoints: I get the same error using python requests as I do with the sample ui. Setting for inference endpoint are default: max input tokens: 1024 max tokens:1025. Screenshot below:

cCaldwell · September 1, 2023, 3:04pm

This problem exists in other modes to. Here is a test on inference endpoint, queried from Databricks(GCP) using requests after setting the inference endpoint to text-to-text:

cCaldwell · September 1, 2023, 3:05pm

and the same but after setting inference endpoint to summarization mode:

Lucisu · November 7, 2023, 7:03am

Hey, @cCaldwell! Have you been able to solve this issue? I have started to have the same problem while using the Llama 2 70b model.

santa1666 · November 14, 2023, 4:25am

same issue with 7b

iamkmson · June 15, 2024, 5:35pm

I’m also facing same issue on all the models I’ve tried.
Does anyone have the solution for this??

PatrikKruse · August 18, 2024, 7:21pm

I am using chat completion with Microsofts SemanticKernel and get the same result. Usually the responses are truncated at 100 tokens.

I have two questions, can I raise the bar above 100 tokens and is there a way to programmatically detect that the response are truncated?

Topic		Replies	Views
Truncated un-finished response after deploying hugging-face models Amazon SageMaker	0	378	January 19, 2024
Model output is cutoff Inference Endpoints on the Hub	4	3531	September 25, 2023
Accelerated Inference API not taking parameters? Intermediate	5	1634	October 26, 2022
Hugging face truncated output via langchain 🤗Transformers	0	104	June 25, 2024
Response from retrieval chain is truncated Beginners	0	426	November 11, 2023

Text Generation response truncation

Related topics