How to make Custom LLm model give longer and detailed answers (LlamaIndex)?

Hello everyone!

I have a question about CustomLLm()/LocalOPT() classes.
I’m using “facebook/opt-iml-1.3b” model on my *.csv documents, where users’ chats are stored.
The main problem is that I get short responses.
For example, I asked “Which of the users are mothers?”


And here I getting mostly one user in the response (except “accumulate” mode where 2 responses)

But expected result is something like this (in one response): “There two mothers in the chat - user Linaiva and user Alicerudra”.

Methods and parameters that I use:

prompt_helper = PromptHelper(context_window = 4096,  
                             num_output = 1024, 
                             chunk_overlap_ratio = 0.1)

llm = LLMPredictor(llm=LocalOPT())
embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L12-v2"))
service_context = ServiceContext.from_defaults(
docs = read_documents(source, embed_model)
index = GPTVectorStoreIndex.from_documents(docs, service_context=service_context)

modes = ["accumulate", "compact_accumulate", "refine", "simple_summarize", "tree_summarize"]

for mode in modes:
  query_engine = index.as_query_engine(
          response_mode= mode)
response = query_engine.query("Which users are mothers?")
print("MODE:", mode, "\n\n", response.response, "\n")

I tried to change prompt_helper parameters, increase similarity_top_k, change llm-model, embeeding model or way of indexing, but the result always the same.

Please, help me if somebody have already faced to this problem and resolved it. I will be very appreciate your help!

1 Like