How to create Q&A chatbot with CSV file

Hi everyone,

thank you in advance to those who are checking my thread.
any kind of help or guidance is greatly appreciated.

I have a CSV file with two columns, one for questions and another for answers:
something like this:

Question Answer
How many times you should wash your teeth per day? it is advisable to wash it three times per day after each meal.
how many times should I use dental floss per day? The American Dental Association suggest that everyone should floss their teeth once a day

[What I want to do]
I want the program to receive the user question, find the most similar question from the CSV and output the answer exactly the same as the one in the answer column.

Because these questions are related to health, I do not want the program to Hallucinate or give answers based on its own knowledge. using temperature 0 only is not enough because I did some tests and sometimes it changed the nuance of the answer or combined the reply from two answers into one.

To prevent this, I want the program to output the exact same answer than the answer column.

my current code is as follows. also, I created the index by loading it with pandasCSVReader and using GPTVectorStoreIndex function

Index creation:

loader = PandasCSVReader()
documents = loader.load_data(file=Path('./health.csv'))

index = GPTVectorStoreIndex.from_documents(documents,llm_predictor=llm_predictor,prompt_helper=prompt_helper)


and the actual program is as follows:

Query engine

import os
os.environ["OPENAI_API_KEY"] = 'XXXXX'

from pathlib import Path
from llama_index import LLMPredictor, GPTVectorStoreIndex, PromptHelper, ServiceContext
from llama_index import StorageContext, load_index_from_storage
from langchain.chat_models import ChatOpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
import gradio as gr

def load_index():
    global index
    llm_predictor = LLMPredictor(
        llm=ChatOpenAI(
            streaming=True,
            callbacks=[StreamingStdOutCallbackHandler()],
            temperature=0,
            model_name="gpt-3.5-turbo",
            max_tokens=2000
        )
    )

 

    max_input_size = 4096
    max_chunk_overlap = 0.2
    chunk_size_limit = 60
    num_outputs = 2000
    prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)
    service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
    storage_context = StorageContext.from_defaults(persist_dir="./storage")
    index = load_index_from_storage(storage_context, service_context=service_context)

 

def chat(chat_history, user_input):
  query_engine = index.as_query_engine()
  question = user_input
  bot_response = query_engine.query(question)
  #print("Q:",user_input)
  response = ""
  for letter in ''.join(bot_response.response): #[bot_response[i:i+1] for i in range(0, len(bot_response), 1)]:
      response += letter + ""
      yield chat_history + [(user_input, response)]
  #print("A:",response)

 
with gr.Blocks(css="footer {visibility: hidden}") as demo:
    gr.Markdown('health advice database)')
    load_index()

    with gr.Tab("chatbot"):
          chatbot = gr.Chatbot()
          message = gr.Textbox ()
          message.submit(chat, [chatbot, message], chatbot)

demo.queue().launch(share= False)

is vectorizing the CSV file and then query it the right approach? or is there another way ?

any kind of example or help os really appreciated.
specially if there is an example. I have almost 0 coding experience and most what I have done is by learning from everyone here and selfstudy.

Kind regards and wish you a great day

2 Likes

you can try TAPAS model for question answering on a CSV dataset
TAPAS

1 Like

vpkprasanna
Thank you very much for the input.

I will deffinitely try it. thank you very much!

Al alternate approach could be to use a semantic search model, because this is more a retrieval problem than a generation one. If I understand correctly, you need people to be able to ask questions in natural language, which is then mapped to the right question, which is then mapped to the right answer from your database. Fuzzy search methods might also be of use. Sentence transfromers are one way to do it, they also show to do it with huggingface models. Faiss is good for this.

Out of interest of my own problem similar to yours, how did you solve this in the end?