Hosting a HF Space for Ultra-Large Language Models

Hi all!

My collaborators and I would like to host a research project demo on spaces. The challenge is that we operate on ultra-large language models such as GPT-2 XL (requires ~6GB VRAM) and GPT-J-6B (~24GB VRAM). Our code itself does not use much VRAM outside of the loading the model (it basically makes some user-specified changes to model and lets users generate text using the new model).

It seems like we can fit GPT-2 XL into our 16 GB allowance for T4s, but what about GPT-J-6B? How is this even hosted at EleutherAI/gpt-j-6B · Hugging Face?

Thanks for your insight :slight_smile:

1 Like

Discussed over email but let’s also try to paste relevant discussion items here in the future so that interesting discussions can be publicly available :pray:

I would for example very much love to hear any tips related to loading GPT-J in different contexts (Colab GPU, TPU VM v3-8, Pytorch, Flax et cetera) and training / finetuning possibilities. Is there any open discussion with the latest updates from HuggingFace somewhere? There are topics and issues spread all over, some outdated, some contradicting.