Hosting a HF Space for Ultra-Large Language Models

kmeng01 · November 13, 2021, 2:17pm

Hi all!

My collaborators and I would like to host a research project demo on spaces. The challenge is that we operate on ultra-large language models such as GPT-2 XL (requires ~6GB VRAM) and GPT-J-6B (~24GB VRAM). Our code itself does not use much VRAM outside of the loading the model (it basically makes some user-specified changes to model and lets users generate text using the new model).

It seems like we can fit GPT-2 XL into our 16 GB allowance for T4s, but what about GPT-J-6B? How is this even hosted at EleutherAI/gpt-j-6B · Hugging Face?

Thanks for your insight

julien-c · November 15, 2021, 9:39am

Discussed over email but let’s also try to paste relevant discussion items here in the future so that interesting discussions can be publicly available

NoToken · December 15, 2021, 8:28am

I would for example very much love to hear any tips related to loading GPT-J in different contexts (Colab GPU, TPU VM v3-8, Pytorch, Flax et cetera) and training / finetuning possibilities. Is there any open discussion with the latest updates from HuggingFace somewhere? There are topics and issues spread all over, some outdated, some contradicting.

Topic		Replies	Views
Using gpt-j-6B in a CPU space without the InferenceAPI Spaces	0	2278	January 28, 2022
Mass issues with GPT4 chatbot Spaces, FIX NEEDED! Spaces	1	428	July 13, 2023
Can I do finetuning on a HF space? Beginners	6	85	April 7, 2025
Spaces based on Gensim Model Spaces	10	1867	February 21, 2022
Feature Suggestion! running large gguf models! Inference Endpoints on the Hub	0	520	December 3, 2023

Hosting a HF Space for Ultra-Large Language Models

Related topics