RAG isnt working as expected

This is my code:

import faiss
from sentence_transformers import SentenceTransformer
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import pickle
import os

Load pickle files

data =
for filename in os.listdir(‘.’): # replace with your directory
if filename.endswith(‘.pickle’):
with open(os.path.join(‘./’, filename), ‘rb’) as f:
datapoint = pickle.load(f)
if datapoint != “nothing”:
data.append(datapoint)

Load pre-trained model for encoding text to vectors

model_data_encoding = SentenceTransformer(‘all-MiniLM-L6-v2’)

Encode all data to vectors

data_vectors = [model_data_encoding.encode(f"{datapoint[‘names’]}: {datapoint[‘axioms’]}") for datapoint in data]

Build a FAISS index

dimension = data_vectors[0].shape[0]
index = faiss.IndexFlatL2(dimension)
for vector in data_vectors:
index.add(vector.reshape(1, -1))

Load pre-trained LLM

tokenizer = AutoTokenizer.from_pretrained(“gpt2”)
model = AutoModelForCausalLM.from_pretrained(“gpt2”)
tokenizer.pad_token_id = tokenizer.eos_token_id

def answer_question(question):
# Encode question to vector
question_vector = model_data_encoding.encode(question)

# Find the most similar documents in the database
distances, indices = index.search(question_vector.reshape(1, -1), 5)

# Get the corresponding data
relevant_data = [data[index] for index in indices[0]]

# Format the data for the LLM
inputs = tokenizer("Given these retrieved axioms from a mathematical database:\n" +" ".join([f"{datapoint['names'][0]}: {datapoint['axioms']}" for datapoint in relevant_data]) + "\nAnswer this question:\n" + question + "\nAnswer: ", return_tensors="pt")

print("Given these retrieved axioms from a mathematical database:\n" +"\n".join([f"{datapoint['names'][0]}: {datapoint['axioms']}" for datapoint in relevant_data]) + "\nAnswer this question:\n" + question + "\nAnswer: ")
# Generate a response
outputs = model.generate(**inputs, max_new_tokens=64, temperature=0.7)

# Decode the response
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Parsing out the first sentence
first_sentence = response.split("Answer:")[1].split(".")[0]

return first_sentence

question = “what is the LA-semihypergroup axiom?”

response = answer_question(question)

print(response)

The response I am getting is:
The LA-semihypergroup axiom is a generalization of the axiom of the inverse semigroup

Instead of the “obvious” answer:
(x*y)z = (zy)*x

Why could this be happening and how can I improve it?

To get this , formula based answer you should have this answer in expernal data (pickle , I guess). And embedding model with tokenizer that support mathematics language . I am afraid all-MiniLM-L6-v2 is not like this. Still hard to give precise answer , but these two direction I would check.

the retrieval is working perfectly fine though, the problem is really the generation. If i put he prompt with the retrieved axioms into chatgpt it works fine. Could this just be a problem with the gpt2 model?

Very likely. If retrieve part works , the problem with synthesizing the answer by GPT2 . Try to run your Retriever with GPT4 API with llama_index and see if your get the same you get in chatGPT . You have tones of examples on the llma_index doc site.