My collaborators and I would like to host a research project demo on spaces. The challenge is that we operate on ultra-large language models such as GPT-2 XL (requires ~6GB VRAM) and GPT-J-6B (~24GB VRAM). Our code itself does not use much VRAM outside of the loading the model (it basically makes some user-specified changes to model and lets users generate text using the new model).
It seems like we can fit GPT-2 XL into our 16 GB allowance for T4s, but what about GPT-J-6B? How is this even hosted at EleutherAI/gpt-j-6B · Hugging Face?
Discussed over email but let’s also try to paste relevant discussion items here in the future so that interesting discussions can be publicly available
I would for example very much love to hear any tips related to loading GPT-J in different contexts (Colab GPU, TPU VM v3-8, Pytorch, Flax et cetera) and training / finetuning possibilities. Is there any open discussion with the latest updates from HuggingFace somewhere? There are topics and issues spread all over, some outdated, some contradicting.