Which is the best way to have and deploy a local LLM?

I am beggining in AI and I was wondering, Which is the best way to deploy projects in production?.

I can use transformers in hugging face to download models, but always I would have to download the model(s) each time that I deploy my project, but I also have inference endpoint in hugging face to only deploy one time.

Download the model directly is only for testing and is not recommended in production?, and to load models for example in .gguf format is for totally local llms in my own server?


I have never tried inference end point but if you want deploy your model without inference endpoint, it will download model only 1 time and for every other time it will just initialize if you are not deleting you run time.

1 Like

You dont need to download the model. When using AutoModel.from_pretrained, you can pass the name of model ( it will download from Hugging Face) or pass a local path directory like “./modelpath”, so the model will be loading from local directory.


from transformers import AutoModel
model = AutoModel.from_pretrained('./my-model-directory')
1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.