Hi, I am trying to know how to use Rag/DPR, but first I want to get familiar with faiss usage.
I checked the official example in
But it seems the snippet code is not self-executable.
So I did some modification, aiming to retrieve similar examples in the sst2 dataset with query ‘I am happy’.
import datasets
from transformers import pipeline
embed = pipeline('sentiment-analysis', model="nlptown/bert-base-multilingual-uncased-sentiment")
ds = datasets.load_dataset('glue', 'sst2', split='test')
ds_with_embeddings = ds.map(lambda example: {'embeddings': embed(example['sentence'])})
ds_with_embeddings.add_faiss_index(column='embeddings')
# query
scores, retrieved_examples = ds_with_embeddings.get_nearest_examples('embeddings', embed('I am happy.'), k=10)
# save index
ds_with_embeddings.save_faiss_index('embeddings', 'my_index.faiss')
ds = datasets.load_dataset('glue', 'sst2', split='test')
# load index
ds.load_faiss_index('embeddings', 'my_index.faiss')
# query
scores, retrieved_examples = ds.get_nearest_examples('embeddings', embed('I am happy.'), k=10)
My problem is at ds_with_embeddings.add_faiss_index(column='embeddings')
I got error with there with " TypeError: float() argument must be a string or a number, not ‘dict’ "
If I changed it to
ds_with_embeddings_score = ds_with_embeddings.map(lambda example: {'embeddings_score': example['embeddings'][0]['score']})
ds_with_embeddings_score.add_faiss_index(column='embeddings_score')
Then I got error " TypeError: len() of unsized object "
Any adivce?Thanks.