Which is the best way to have and deploy a local LLM?

BrokenSoul · February 8, 2024, 11:46pm

I am beggining in AI and I was wondering, Which is the best way to deploy projects in production?.

I can use transformers in hugging face to download models, but always I would have to download the model(s) each time that I deploy my project, but I also have inference endpoint in hugging face to only deploy one time.

Download the model directly is only for testing and is not recommended in production?, and to load models for example in .gguf format is for totally local llms in my own server?

Thanks.

DP13 · February 9, 2024, 10:08am

I have never tried inference end point but if you want deploy your model without inference endpoint, it will download model only 1 time and for every other time it will just initialize if you are not deleting you run time.

gugaio · February 9, 2024, 12:36pm

You dont need to download the model. When using AutoModel.from_pretrained, you can pass the name of model ( it will download from Hugging Face) or pass a local path directory like “./modelpath”, so the model will be loading from local directory.

Example:

from transformers import AutoModel
model = AutoModel.from_pretrained('./my-model-directory')

system · February 10, 2024, 12:36am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Download LLM Model Beginners	4	11860	May 12, 2024
How to deploy project / MOdel on huggin face Models	2	110	August 28, 2024
Deploying my own custom Llama model to production using Hugging Face Beginners	0	827	December 9, 2023
Best way to deploy a SLM/LLM model. Best library and approach? Research	6	760	March 11, 2025
Productionizing HuggingFace Transformers? Beginners	1	3113	September 12, 2022

Which is the best way to have and deploy a local LLM?

Related topics