What is best way to serve huggingface model with API?

yoavz · July 25, 2020, 2:11pm

You have a few different options, here are some in increasing level of difficulty

You can use the Hugging Face Inference API via Model Hub if you are just looking for a demo.
You can use a hosted model deployment platform: GCP AI predictions, SageMaker, https://modelzoo.dev/. Full disclaimer, I am the developer behind Model Zoo, happy to give you some credits for experimentation.
You can roll your own model server with something like https://fastapi.tiangolo.com/ and deploy it on a generic serving platform like AWS Elastic Beanstalk or Heroku. This is the most flexible option.

Topic		Replies	Views
How can I adapt this code to deploy it in HuggingFace? Beginners	0	243	September 10, 2023
Using huggingface as a hosting / CDN for a pretrained model 🤗Transformers	0	138	November 29, 2024
Is that possible to embed the tokenizer into the model to have it running on GCP using TensorFlow Serving? 🤗Tokenizers	4	3240	January 12, 2023
Help for inference.py code Amazon SageMaker	10	4003	March 8, 2022
Productionizing HuggingFace Transformers? Beginners	1	3169	September 12, 2022