Which model for inference on 11 GB GPU?

Eichhof · October 28, 2021, 12:20am

Hello everybody

I’ve just found the amazing Huggingface library. It is an awesome piece of work.

I would like to train a chatbot on some existing dataset or several datasets (e.g. the Pile). For training (or fine-tuning) the model I have no GPU memory limitations (48 GB GPU is available). For inference, I only have a GPU with 11 GB available. Inference should be feasible in real-time (i.e. below around 3 seconds) and the model should be adjustable, i.e. the source code should be available to change the structure of the model.

What model is best when taking into account these requirements? Probably one of the best models is GPT-J but I think for inference it needs more than 11 GB GPU.

Eichhof · October 30, 2021, 10:47am

Does anybody have some input? Any input is highly appreciated.

Topic		Replies	Views
Feature Suggestion! running large gguf models! Inference Endpoints on the Hub	0	529	December 3, 2023
Fine-tuning GPT-J for conversations Beginners	2	5084	January 15, 2023
Open Source LLM models I can use for P620 2GB GPU Beginners	0	752	June 16, 2023
PRO Plan and for running huge models on free inference api? Beginners	1	1807	May 15, 2023
Paid API Service Beginners	6	1536	January 6, 2023

Which model for inference on 11 GB GPU?

Related topics