RuntimeError: Error in void faiss::gpu::allocMemorySpaceV(faiss::gpu::MemorySpace, void**, size_t) at gpu/utils/MemorySpace.cpp:26: Error: 'err == cudaSuccess' failed: failed to cudaMalloc 27162080256 bytes (error 2 out of memory)
I am working on a NVIDIA Tesla K80 with 1 GPU having 11.4GB memory. Well, it seems that with this hardward is impossible to index it on the GPU. Running the index on the CPU leads to infeasible execution time on the next step since I would like to get the top 1000 nearest examples for 6900 queries.
Any workarounds? I’m not sure if I’ll have the resources, but if I get access to several GPUs like the one that I’m working with, I just need to specify multiple GPU IDs when loading the faiss index, right?
Well, it seems that you are simply running out if memory. I assume that you either have a very large index or you also have a model on the GPU at the same time?
Thank you for your message, Bram. I have put the models back to CPU after generating the embeddings, only the index is on the GPU.
It seems that the index is very large: ~27.1GB. Moreover, when I run htop, the process has 27.1GB on resident memory, which I suppose will be transferred to the GPU which leads to the RunTimeError above.
If you are using Faiss, I believe you can do PCA to reduce the number of dimensions of your vectors. Also, this line from Faiss’ GitHub page makes me think you can get around the large dataset size.
Faiss contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM.
I will explore the PCA alternative and report it here.
Regarding the memory issue, when I ran with either in CPU or GPU it always needs to allocate 27.1GB of resident memory. I can run it on the CPU because I have enough RAM memory to fit this 27.1GB, which is not the case for the GPU.
I created my own MSMARCO dataset with these two tutorials: Writing a dataset loading script and squad dataset loading script. Furthermore, I added the FAISS index as Adding a FAISS index. I have generated context embeddings with DPRContextEncoder, DPRContextEncoderTokenizer and question embeddings with DPRQuestionEncoder and DPRQuestionEncoderTokenizer. Thus the embedding vectors have a size of 768.
I came to realize that something was not making much sense. FAISS is a library for efficient similarity search and clustering of dense vectors. Thus, I would expect that getting the top-k nearest examples would not be a “very long task”. Therefore, I have done some tests on running get_nearest_examples() with a FAISS index over two different queries (4000 and 4001) several times on the CPU.
As you can observe, the first time I try to retrieve the top-1000 passages for both queries 4000 and 4001 the wall time is ~9-10min. After that, the wall time matches the CPU time, which seems to indicate that get_nearest_examples() is I/O bounded and uses efficient caching.
Should I be expecting faster retrievals on the first attempt? Notice that the problem after all is not on the computation (it seems to be no advantages on moving the index to the GPU, as I tried on the above replies). All the CPU times “were short”, which makes sense since FAISS is a library for efficient similarity search and clustering of dense vectors.
There must be some sort of caching for the first query to be longer than the others.
Also please note that right now the default index used in when you do add_faiss_index is an L2 flat index.
The flat index can be quite slow so you should probably use the HNSW index from faiss.
Moreover for DPR embeddings, we have to tell the index to use the maximum inner product metric instead of L2.
To summarize, here’s how to use HSNW with maximum inner product:
import faiss
d = 768 # vectors dimension
m = 128 # hnsw parameter. Higher is more accurate but takes more time to index (default is 32, 128 should be ok)
index = faiss.IndexHNSWFlat(d, m, faiss.METRIC_INNER_PRODUCT)
my_dataset.add_faiss_index("embeddings", custom_index=index)
I checked DPR in the Facebook Research repo some days ago and according to their DenseHSWFlatIndexer class, I came up with the following configuration that is the same of the official paper:
They wrote in the comments of the code the following:
# IndexHNSWFlat supports L2 similarity only
# so we have to apply DOT -> L2 similairy space conversion with the help of an extra dimension
Which at the time I did not understand very well. Thus, I just defined the dimension of the vectors as 768 but did not tell the index to use the maximum inner product metric instead of L2. Why do we have to tell the index to use the inner product?
The DPR embeddings are trained so that the pairs of questions/contexts maximize the inner product.
When they first released the official DPR code, the maximum inner product metric was not available in faiss for HNSW yet. Therefore they used a trick to make the L2 index work to maximise inner product by using this trick that adds an extra dimension.
However times have changed and now HNSW supports the maximum inner product metric so you don’t need to use this trick anymore. You can just specify faiss.METRIC_INNER_PRODUCT inside the HNSW arguments.
Oh and m=512 is a bit overkill.
Acoording to Patrick Lewis’ work on RAG (a model based on DPR), 128 looked more reasonable.
It led to the same performances with a significant reduction of the index size (from 140GB to around 70GB).
In the final RAG implementation, they used m=128 with an SQ8 quantization, and the final index takes 40GB.
efSearch is the search factor at inference time while efConstruction is used when building the index.
I think these are reasonable values.
I’ve started experimenting with these parameters recently. I can tell that efSearch of 128 leads to way better results that the default 16 for retrieval, even though it’s a bit slower. You can experiment with different values of efSearch once your index is built. I’ll share my results soon.
For efConstruction I haven’t tested yet but I’ll let you know.
Feel free to browse the Faiss repo and wiki if you’re interested in tuning the index parameters for your needs.
I will let you know about the executions times once I have created the index and retrieve the top-1000 passages for some queries of MSMACRO with the new index.
I’m looking forward to seeing your results as well.
@lhoestq I decreased the m from 512 to 128 and I am now using inner product metric (faiss.METRIC_INNER_PRODUCT). The time to index almost doubled (from 50 hours to 100 hours), is it expected?
import faiss
store_n = 128 # neighbors to stode per node - Quentin said it was enough based on RAG
ef_search = 128 # search depth
ef_construction = 200 # construction time search depth
vector_sz = 768 # vector dimension
hnsw_index = faiss.IndexHNSWFlat(vector_sz, store_n, faiss.METRIC_INNER_PRODUCT)
hnsw_index.hnsw.efSearch = ef_search
hnsw_index.hnsw.efConstruction = ef_construction
docs = dpr_doc_sys.get_documents()['train']
docs.add_faiss_index(column='embeddings', custom_index=hnsw_index)
The inner product metric makes the faiss index slower to build unfortunately…
If you really want to make it a bit faster you could try first with the default value of efConstruction (40)