I work with long protein sequences (more than 15000 characters). I would like to get embeddings of such long protein sequences using Rostlab/prot_t5_xl_half_uniref50-enc. In my case I try to get embedding providing a whole sequence, as I think splitting a protein sequence could cause a different result. Unfortunately, the process returns me OutOfMemoryError.
OutOfMemoryError: CUDA out of memory. Tried to allocate 42.46 GiB. GPU 0 has a total capacty of 14.75 GiB of which 3.80 GiB is free. Process 13042 has 10.95 GiB memory in use. Of the allocated memory 10.02 GiB is allocated by PyTorch, and 127.66 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
In my case I have 4 x 16 GiB GPUs. Do you happen to know whether there is the way for distributed inference without sequence splitting and model quantisation?