Text Generation response truncation

I am having an issue where all responses are truncated, no matter which model or where they are hosted. Other questions on this topic have referenced required changes in langchain modules, but we are not using langchain.

For reference, I am getting the same error from:
databricks/dolly-v2-3b run on databricks ML RT 13.3 (56 ram 8cpu GCP)
databricks/dolly-v2-3b run on HF free inference using huggingface_hub library
databricks/dolly-v2-7b huggingface inference endpoint Nvidia A10G using recommended sample input
tiiuae/falcon-7b huggingface inference endpoint Nvidia A10G using recommended sample input

In case of HF inference endpoints: I get the same error using python requests as I do with the sample ui. Setting for inference endpoint are default: max input tokens: 1024 max tokens:1025. Screenshot below:

This problem exists in other modes to. Here is a test on inference endpoint, queried from Databricks(GCP) using requests after setting the inference endpoint to text-to-text:

and the same but after setting inference endpoint to summarization mode:

Hey, @cCaldwell! Have you been able to solve this issue? I have started to have the same problem while using the Llama 2 70b model.

same issue with 7b