Use TF or PyTorch?
For PyTorch TorchServe or Pipelines with something like flask?
Use TF or PyTorch?
For PyTorch TorchServe or Pipelines with something like flask?
You have a few different options, here are some in increasing level of difficulty
@yoavz Hey I am also looking for an answer regarding this, can you give more reference or tutorial regarding this? Thank you
Sure – here is are more links for each path:
Interested in model serving too. I don’t think that FastAPI stack is what we want - good for quickstart, but it’s preferable to have the FastAPI serve your web API and a job queue (eg RabbitMQ) for submitting expensive GPU jobs. I currently have an EC2 instance I spin-up on demand from FastAPI server, submit job, receive results, send to client. Alternatively you can use AWS Batch with a Dockerfile for your transformers models. But in the spin-up case, scaling is a real pain; and in the Batch case, huge overhead in the Batch job coming online just for inference.
What we really want is proper cloud hosting of models, eg via GCP AI Platform. I’m not sure if Model Zoo serves this purpose, I’ll check it out ASAP. I do see https://github.com/maxzzze/transformers-ai-platform/tree/master/models/classification, but its last commit is Feb 10 & my quick-scan of the repo makes me think it might be a bit rigid and will take a fair bit of tinkering for flexible use-cases.
What would really be handy is a tutorial on deploying transformers models to GCP AI. How to prepare & upload; how to separate surrounding code (model prep, tokenization prep, etc); how to deal with their 500mb model quota; all that stuff. Ideally there’d be some fairly 1st-class huggingface exporter, or on-site tutorial.
Actually, this could be a business prop for Hugginface: host your models, and charge for API calls! We’d dev locally to get things sorted, but then switch to API so we don’t have to worry about instance scaling & the like. Anyway, I’ll check out Model Zoo in case that’s what it does.
I have an shared an example using Torchserve (for the NER use-case) but it can be extended to other types by using different pipelines.
blogpost and repo
Includes a demo UI too!
(can’t include more links because I’m a new user on this forum…just refer to the blogpost)
Hope it helps~
Is there a way to serve the hugging face bert model with TF serving such that the TF serving handles the tokenization along with inference? Any related documentation or blog post?
Hi @anubhavmaity !
Thanks for your question, unfortunately it is currently not possible to integrate the tokenization process along with inference directly inside a saved model. Nevertheless, it is part of our plans to make this available and we are currently rethinking the way the saved models are handled in transformers
I know this is old, but have you seen this ? https://huggingface.co/pricing.
Basically exactly what you’re asking for. We’re hosting your models and running them at scale !