Error in chat-your-data space when query input

Hi Friends,

~My Project / Background~
I’m using a chat-your-data space to query a .txt file and ask it questions via a prompt through the openai API. It uses Langchain to ingest data from a source file (in my case a scientific pub., Harrison’s below the recent State of the Union Address) formats it into a .pkl file and then runs an app to allow queries of the extracted data through a prompt to openAI API.

I cloned a chat-your-data space from Harrison Chase’s work/commit (ref 1). It seems about 7 other people (ref 2) are also using other various chat-your-data clones (I tested their models, they are working) the working models and mine use Harrison’s code but only change the name of the .txt file to be ingested/queried in the app.py , ingest_data.py, query_data.py etc. and use a new .pkl (pickle file; or the ingested .txt data)

~Juypter Notebook troubleshooting~
To troubleshoot I ran all the code in my Jupyter notebook semi-successfully until getting the following error (ref 3). So it seems like for whatever reason my code is using a prompt that is too many tokens. I don’t really understand why I’m getting a too many tokens error since my prompt isn’t longer than the prompts in the other working models from ref 1 and ref 2. Must be something about running it locally since im not in the hugging face virtual environment when I get the too many tokens error? I don’t understand that at all, but its not the real issue since I want to run the code on hugging face not in a Jupyter notebook ultimately.

~Error shown when running on Hugging Face=Open Logs~
When I try to run the code in a Hugging Face space the GUI loads and I can input my API key and ask a query, but inputing any query results in an error (ref 4). I only changed the code to update the .txt file from Harrison’s state-of-the-union.txt to my SMR4 publication.txt so I am stumped as to why I’m gettin those langchain variable errors through FAISS in ref 4.

~Help~
Any Advice? Thanks friends.

~References~
ref 1:

ref 2:
huggingface. co/spaces?search=chat%20your%20data

ref 3:
InvalidRequestError: This model’s maximum context length is 4097 tokens, however you requested 4539 tokens (4283 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.

ref 4:
Python error message indicating that there is a problem with a function call in the code. The error occurred in the search() function in the faiss.py module of the langchain library.
The error message indicates that there are three positional arguments missing from the search() function call: k, distances, and labels. These arguments are required by the search() function and must be provided in order to perform a search.

Actual log:

Traceback (most recent call last):
File “/home/user/.local/lib/python3.8/site-packages/gradio/routes.py”, line 344, in run_predict
output = await app.get_blocks().process_api(
File “/home/user/.local/lib/python3.8/site-packages/gradio/blocks.py”, line 1012, in process_api
result = await self.call_function(
File “/home/user/.local/lib/python3.8/site-packages/gradio/blocks.py”, line 830, in call_function
prediction = await anyio.to_thread.run_sync(
File “/home/user/.local/lib/python3.8/site-packages/anyio/to_thread.py”, line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File “/home/user/.local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py”, line 937, in run_sync_in_worker_thread
return await future
File “/home/user/.local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py”, line 867, in run
result = context.run(func, *args)
File “app.py”, line 45, in call
raise e
File “app.py”, line 42, in call
output = chain({“question”: inp, “chat_history”: history})[“answer”]
File “/home/user/.local/lib/python3.8/site-packages/langchain/chains/base.py”, line 142, in call
raise e
File “/home/user/.local/lib/python3.8/site-packages/langchain/chains/base.py”, line 139, in call
outputs = self._call(inputs)
File “/home/user/.local/lib/python3.8/site-packages/langchain/chains/chat_vector_db/base.py”, line 91, in _call
docs = self.vectorstore.similarity_search(new_question, k=4, **vectordbkwargs)
File “/home/user/.local/lib/python3.8/site-packages/langchain/vectorstores/faiss.py”, line 163, in similarity_search
docs_and_scores = self.similarity_search_with_score(query, k)
File “/home/user/.local/lib/python3.8/site-packages/langchain/vectorstores/faiss.py”, line 133, in similarity_search_with_score
docs = self.similarity_search_with_score_by_vector(embedding, k)
File “/home/user/.local/lib/python3.8/site-packages/langchain/vectorstores/faiss.py”, line 107, in similarity_search_with_score_by_vector
scores, indices = self.index.search(np.array([embedding], dtype=np.float32), k)
TypeError: search() missing 3 required positional arguments: ‘k’, ‘distances’, and ‘labels’

ref 5:

I had earlier creating Vectorstore file in a Windows laptop and was trying to use it on Linux on Heroku. and was getting below error :
TypeError: search() missing 3 required positional arguments: ‘k’, ‘distances’, and ‘labels’


Solution : Vectorstore has to be created on the same environment(CPU arch) as to where it is used for querying(on Linux on Heroku which is my scenario))

Load Data to Vectorstore
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(documents, embeddings)

Hi abhishek1047,

Thanks for your response. I think my code is using the same chain you mentioned for the solution already, I highlighted in bold below from the file: ingest_data.py

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredFileLoader
from langchain.vectorstores.faiss import FAISS
from langchain.embeddings import OpenAIEmbeddings
import pickle

Load Data

loader = UnstructuredFileLoader(“SMR4 publication.txt”)
raw_documents = loader.load()

Split text

text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(raw_documents)

# Load Data to vectorstore
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(documents, embeddings)

Save vectorstore

with open(“vectorstore.pkl”, “wb”) as f:
pickle.dump(vectorstore, f)