Hi Friends,
~My Project / Background~
I’m using a chat-your-data space to query a .txt file and ask it questions via a prompt through the openai API. It uses Langchain to ingest data from a source file (in my case a scientific pub., Harrison’s below the recent State of the Union Address) formats it into a .pkl file and then runs an app to allow queries of the extracted data through a prompt to openAI API.
I cloned a chat-your-data space from Harrison Chase’s work/commit (ref 1). It seems about 7 other people (ref 2) are also using other various chat-your-data clones (I tested their models, they are working) the working models and mine use Harrison’s code but only change the name of the .txt file to be ingested/queried in the app.py , ingest_data.py, query_data.py etc. and use a new .pkl (pickle file; or the ingested .txt data)
~Juypter Notebook troubleshooting~
To troubleshoot I ran all the code in my Jupyter notebook semi-successfully until getting the following error (ref 3). So it seems like for whatever reason my code is using a prompt that is too many tokens. I don’t really understand why I’m getting a too many tokens error since my prompt isn’t longer than the prompts in the other working models from ref 1 and ref 2. Must be something about running it locally since im not in the hugging face virtual environment when I get the too many tokens error? I don’t understand that at all, but its not the real issue since I want to run the code on hugging face not in a Jupyter notebook ultimately.
~Error shown when running on Hugging Face=Open Logs~
When I try to run the code in a Hugging Face space the GUI loads and I can input my API key and ask a query, but inputing any query results in an error (ref 4). I only changed the code to update the .txt file from Harrison’s state-of-the-union.txt to my SMR4 publication.txt so I am stumped as to why I’m gettin those langchain variable errors through FAISS in ref 4.
~Help~
Any Advice? Thanks friends.
~References~
ref 1:
ref 2:
huggingface. co/spaces?search=chat%20your%20data
ref 3:
InvalidRequestError: This model’s maximum context length is 4097 tokens, however you requested 4539 tokens (4283 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.
ref 4:
Python error message indicating that there is a problem with a function call in the code. The error occurred in the search() function in the faiss.py module of the langchain library.
The error message indicates that there are three positional arguments missing from the search() function call: k, distances, and labels. These arguments are required by the search() function and must be provided in order to perform a search.
Actual log:
Traceback (most recent call last):
File “/home/user/.local/lib/python3.8/site-packages/gradio/routes.py”, line 344, in run_predict
output = await app.get_blocks().process_api(
File “/home/user/.local/lib/python3.8/site-packages/gradio/blocks.py”, line 1012, in process_api
result = await self.call_function(
File “/home/user/.local/lib/python3.8/site-packages/gradio/blocks.py”, line 830, in call_function
prediction = await anyio.to_thread.run_sync(
File “/home/user/.local/lib/python3.8/site-packages/anyio/to_thread.py”, line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File “/home/user/.local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py”, line 937, in run_sync_in_worker_thread
return await future
File “/home/user/.local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py”, line 867, in run
result = context.run(func, *args)
File “app.py”, line 45, in call
raise e
File “app.py”, line 42, in call
output = chain({“question”: inp, “chat_history”: history})[“answer”]
File “/home/user/.local/lib/python3.8/site-packages/langchain/chains/base.py”, line 142, in call
raise e
File “/home/user/.local/lib/python3.8/site-packages/langchain/chains/base.py”, line 139, in call
outputs = self._call(inputs)
File “/home/user/.local/lib/python3.8/site-packages/langchain/chains/chat_vector_db/base.py”, line 91, in _call
docs = self.vectorstore.similarity_search(new_question, k=4, **vectordbkwargs)
File “/home/user/.local/lib/python3.8/site-packages/langchain/vectorstores/faiss.py”, line 163, in similarity_search
docs_and_scores = self.similarity_search_with_score(query, k)
File “/home/user/.local/lib/python3.8/site-packages/langchain/vectorstores/faiss.py”, line 133, in similarity_search_with_score
docs = self.similarity_search_with_score_by_vector(embedding, k)
File “/home/user/.local/lib/python3.8/site-packages/langchain/vectorstores/faiss.py”, line 107, in similarity_search_with_score_by_vector
scores, indices = self.index.search(np.array([embedding], dtype=np.float32), k)
TypeError: search() missing 3 required positional arguments: ‘k’, ‘distances’, and ‘labels’
ref 5: