Use TF or PyTorch?
For PyTorch TorchServe or Pipelines with something like flask?
Use TF or PyTorch?
For PyTorch TorchServe or Pipelines with something like flask?
You have a few different options, here are some in increasing level of difficulty
@yoavz Hey I am also looking for an answer regarding this, can you give more reference or tutorial regarding this? Thank you
Sure – here is are more links for each path:
Interested in model serving too. I don’t think that FastAPI stack is what we want - good for quickstart, but it’s preferable to have the FastAPI serve your web API and a job queue (eg RabbitMQ) for submitting expensive GPU jobs. I currently have an EC2 instance I spin-up on demand from FastAPI server, submit job, receive results, send to client. Alternatively you can use AWS Batch with a Dockerfile for your transformers models. But in the spin-up case, scaling is a real pain; and in the Batch case, huge overhead in the Batch job coming online just for inference.
What we really want is proper cloud hosting of models, eg via GCP AI Platform. I’m not sure if Model Zoo serves this purpose, I’ll check it out ASAP. I do see https://github.com/maxzzze/transformers-ai-platform/tree/master/models/classification, but its last commit is Feb 10 & my quick-scan of the repo makes me think it might be a bit rigid and will take a fair bit of tinkering for flexible use-cases.
What would really be handy is a tutorial on deploying transformers models to GCP AI. How to prepare & upload; how to separate surrounding code (model prep, tokenization prep, etc); how to deal with their 500mb model quota; all that stuff. Ideally there’d be some fairly 1st-class huggingface exporter, or on-site tutorial.
Actually, this could be a business prop for Hugginface: host your models, and charge for API calls! We’d dev locally to get things sorted, but then switch to API so we don’t have to worry about instance scaling & the like. Anyway, I’ll check out Model Zoo in case that’s what it does.
I have an shared an example using Torchserve (for the NER use-case) but it can be extended to other types by using different pipelines.
blogpost and repo
Includes a demo UI too!
(can’t include more links because I’m a new user on this forum…just refer to the blogpost)
Hope it helps~
Is there a way to serve the hugging face bert model with TF serving such that the TF serving handles the tokenization along with inference? Any related documentation or blog post?
@jplu might help with this
Hi @anubhavmaity !
Thanks for your question, unfortunately it is currently not possible to integrate the tokenization process along with inference directly inside a saved model. Nevertheless, it is part of our plans to make this available and we are currently rethinking the way the saved models are handled in transformers
I know this is old, but have you seen this ? https://huggingface.co/pricing.
Basically exactly what you’re asking for. We’re hosting your models and running them at scale !
We are considering - Deploying the models in Sagemaker vs Deploying in EC2.
What is others’ opinion about this.
Sagemaker -
We found that there are limited models available in Sagemaker and have dependencies such as some models not available in certain regions
Having a model in S3 bucket may not go well with some regulations which need data to be present locally
We found it expensive. Currently, we want to run it for a while for testing and when not in use, wanted to shut down the instance. But leaving the dev env intact. Sagemaker posed some limitations. Though doable but more work.
Serving model through Django/REST API server:
Currently exploring, downloading a model on EC2 and then running infrence client in an async loop. Thus client->Rest API->Routed to Hugging face infrence objects like Pipeline…
AWS Infrentia servers
Still checking with AWS if that’s a better possibility. The end goal would be to have better latencies and cost optimizations vs EC2. However, it’s not a trouble for us for now as in development/testing - we will have minimal flow.
Would be good to hear others’ thoughts and experience.