Productionizing HuggingFace Transformers?

vanlifecoder · September 12, 2022, 11:27am

what’s a common reference architecture for companies that use sentence transformers via huggingface in production?

i was thinking:

api gateway → queue → serverless (sentence transformer module)

is it best to co-locate the model file in my lambda VPN? looking for any and all best practices.

NimaBoscarino · September 12, 2022, 11:55am

Hi there! I think the things that it depends on most are:

Your company’s existing stack
Your use-case (expected load, real-time vs. batched, etc.)

Can you share a bit more about what those things look like in your situation? Without that (IMO) it’s a bit difficult to give any recommendations. I could point you towards our Inference API service, for example (Inference API - Hugging Face), which lets you offload that to our infrastructure. Or you could take an approach like the one outlined here: How to Deploy NLP Models in Production - neptune.ai. Some companies might set up entire CI/CD situations if they need to constantly monitor, retrain, and redeploy their models (Continuous Delivery for Machine Learning).

If you have more details about your use-case I can definitely try to provide more details!

Topic		Replies	Views
Deploying Sentence Transformer as sagemaker endpoint Amazon SageMaker	18	8193	March 26, 2024
What is best way to serve huggingface model with API? Beginners	11	42458	August 29, 2023
Can I train and deploy a sentence transformer model using Huggingface estimator Models	0	641	June 6, 2022
Transformers 4.9.0 on SageMaker Amazon SageMaker	12	1969	March 25, 2022
Deploying to Model Hub for Inference with custom tokenizer Beginners	1	624	January 1, 2022

Productionizing HuggingFace Transformers?

Related topics