I have a large csv file (35m rows) in the following format:
id, sentence, description
Normally in inference mode, Id like to use model like so:
for iter_through_csv: model = SentenceTransformer('flax-sentence-embeddings/some_model_here', device=gpu_id) encs = model.encode(row, normalize_embeddings=True)
But since I have GPUs Id like to batch it. However, the size is large (35m), so I do not want to read in memory and batch.
I am struggling to find a template to batch csv on huggingface.
What is the most optimal way to do this?