Productionizing HuggingFace Transformers?

Hi there! I think the things that it depends on most are:

  1. Your company’s existing stack
  2. Your use-case (expected load, real-time vs. batched, etc.)

Can you share a bit more about what those things look like in your situation? Without that (IMO) it’s a bit difficult to give any recommendations. I could point you towards our Inference API service, for example (Inference API - Hugging Face), which lets you offload that to our infrastructure. Or you could take an approach like the one outlined here: How to Deploy NLP Models in Production - neptune.ai. Some companies might set up entire CI/CD situations if they need to constantly monitor, retrain, and redeploy their models (Continuous Delivery for Machine Learning).

If you have more details about your use-case I can definitely try to provide more details!

1 Like