Incomplete/ partial response generation

I am using langchain and huggingfacehub with model llama2 like

HuggingFaceService.llm = HuggingFaceHub(repo_id=“mistralai/Mistral-7B-v0.1”, model_kwargs={ ‘temperature’: 0.001, “max_length”:1000000000 })
Below is my prompt, but its not generating the complete response. It’s just returning this : {
“result”: “\n\nThere are two types of workflows that can be created in the platform which include:”
}

How can I get the complete usable response, I became a pro user as this model is available for pro users only.

Entering new LLMChain chain…
Prompt after formatting:
Use the following pieces of context to answer the question at the end. If you don’t know the answer, just say that you don’t know, don’t try to make up an answer.

[Document(page_content='Types of workflow in the DigitalChameleon platform There are two types of workflows that can be created in the platform which include: 1.\tConversation: A series of nodes with questions or text displayed to the customer in a sequence one by one, to capture the response of Customer, is referred to as Conversation workflow. The nodes of a workflow of conversation type are loaded on the webpage to the customer one at a time. The flow can be modified to return to a previous flow or allow customer to resume work at a later point in time. Workflow will go to the next node only when the customer performs the desired action in the previous node as configured in the workflow. 2.\tForm: A one time loading of the nodes/questions/messages to the end customer all at once in the UI of a form. The form will be created in the similar manner as we create for conversation in the CMS except for the workflow type in the journey properties should be selected as Form while creating/copying the workflow.

Question: Types of workflow in the DigitalChameleon platform
Helpful Answer:

Have you tried the instruct model? https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1

ANSWER!!

It was because the max_new_tokens is 20 by default. By increasing the value, I was able to get the full length response.

Increasing the maximum token helps in some but not all cases.