Distributed inference for long strings

Kavod · November 17, 2023, 3:23pm

I work with long protein sequences (more than 15000 characters). I would like to get embeddings of such long protein sequences using Rostlab/prot_t5_xl_half_uniref50-enc. In my case I try to get embedding providing a whole sequence, as I think splitting a protein sequence could cause a different result. Unfortunately, the process returns me OutOfMemoryError.

OutOfMemoryError: CUDA out of memory. Tried to allocate 42.46 GiB. GPU 0 has a total capacty of 14.75 GiB of which 3.80 GiB is free. Process 13042 has 10.95 GiB memory in use. Of the allocated memory 10.02 GiB is allocated by PyTorch, and 127.66 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

In my case I have 4 x 16 GiB GPUs. Do you happen to know whether there is the way for distributed inference without sequence splitting and model quantisation?

Topic		Replies	Views
RuntimeError: CUDA out of memory even with simple inference Beginners	1	5370	January 16, 2022
Out of memory error when creating a lot of embeddings Models	2	4981	March 4, 2023
Memory Usage for Inference Depending on Size of Input Data 🤗Transformers	1	4421	September 18, 2023
Failed to Initialize Bloom-7B Due to Lack of CUDA memory Inference Endpoints on the Hub	5	803	May 30, 2023
Positional Encoding error, Protein Bert Model Intermediate	2	652	October 25, 2020

Distributed inference for long strings

Related topics