It seems that add_faiss_index functionality is not available in IterableDataset. My question is that if the dataset is big and not fit in memory, how can I leverage FAISS and Elastic capabilities?
Hi! Yes, IterableDataset
doesn’t support vector similarity search, because, with it, you only have access to one example at a time. It seems that both Faiss and ElasticSearch support memory mapping, so we will probably add support for that to the Dataset
class soon.
Some external resources that coud help:
- Faiss - Indexes that do not fit in RAM · facebookresearch/faiss Wiki · GitHub
- ElasticSearch (
mmapfs
) - Store | Elasticsearch Guide [7.16] | Elastic
1 Like