Possible to upgrade GPU pinned instance with more memory?

benjismith · December 16, 2021, 11:33pm

I have a large language model that I’m using for text-generation, and I have it deployed right now, with GPU pinning enabled.

I’m extremely happy with the results so far.

But I frequently get CUDA out-of-memory errors if I supply too many tokens in my prompt, or if I request too many tokens in the completion. I don’t have exact numbers quite yet, but things seem to fail when the total token count (prompt + completion) is greater than roughly 500.

The model is based on GPT-J, which can theoretically handle 2048 context tokens, given sufficient memory, and I’d like to run some tests, with the model somewhat closer to the limits of its capabilities.

So I’d like to ask if it’s possible to upgrade my account, in order to get a larger allotment of GPU memory?

benjismith · December 28, 2021, 12:43am

@Narsil

Any upgrade options available? Please, take my money!

Topic		Replies	Views
Want to deploy VQGAN+CLIP on GPU, having atleast 16GB RAM 🔒 Gradio	2	1061	March 29, 2022
Generating 10000 sentences from GptNeo Model results in out of memory error Beginners	0	240	March 23, 2022
Cuda out of memory issue training whisper model on single GPU Intermediate	0	916	December 15, 2023
Running into OOM on GPU with quantized llama-3-8b for text generation inference Models	0	498	June 29, 2024
Out of memory error when creating a lot of embeddings Models	2	5034	March 4, 2023

Possible to upgrade GPU pinned instance with more memory?

Related topics