My collaborators and I would like to host a research project demo on spaces. The challenge is that we operate on ultra-large language models such as GPT-2 XL (requires ~6GB VRAM) and GPT-J-6B (~24GB VRAM). Our code itself does not use much VRAM outside of the loading the model (it basically makes some user-specified changes to model and lets users generate text using the new model).
It seems like we can fit GPT-2 XL into our 16 GB allowance for T4s, but what about GPT-J-6B? How is this even hosted at EleutherAI/gpt-j-6B · Hugging Face?
Thanks for your insight