Hosting a HF Space for Ultra-Large Language Models

Hi all!

My collaborators and I would like to host a research project demo on spaces. The challenge is that we operate on ultra-large language models such as GPT-2 XL (requires ~6GB VRAM) and GPT-J-6B (~24GB VRAM). Our code itself does not use much VRAM outside of the loading the model (it basically makes some user-specified changes to model and lets users generate text using the new model).

It seems like we can fit GPT-2 XL into our 16 GB allowance for T4s, but what about GPT-J-6B? How is this even hosted at EleutherAI/gpt-j-6B · Hugging Face?

Thanks for your insight :slight_smile:

1 Like

Discussed over email but let’s also try to paste relevant discussion items here in the future so that interesting discussions can be publicly available :pray: