Hi community,
I have come through the nice article by @mfuntowicz : Hugging Face – The AI community building the future.
It sounds really interesting how easily you can benchmark your BERT transformer model with CLI and Facebook AI & Research’s Hydra configuration library.
Is it possible however to easily test it on cloud services as AWS and how to deploy it?
Thanks!
2 Likes
Hey @Matthieu,
Thanks for reading and posting here .
Indeed, everything in the blog was run in AWS (c5.metal) instance(s).
The way I’m currently using it:
git clone https://github.com/huggingface/tune
cd tune
pip install -r requirements.txt
export PYTHONPATH=src
python src/main.py --multirun backend=pytorch batch=1 sequence_length=128,256,512
The overall framework is quite new and I’ll be improving the UX in the coming days, sorry for this user experience
Morgan
1 Like
Hi @mfuntowicz
Thanks again for the great article.
Indeed, everything in the blog was run in AWS (c5.metal) instance(s).
The way I’m currently using it:
git clone https://github.com/huggingface/tune
cd tune
pip install -r requirements.txt
export PYTHONPATH=src
python src/main.py --multirun backend=pytorch batch=1 sequence_length=128,256,512
- So you ran these command lines directly from the c5.metal instance? You didn’t need to install before on it a docker image with OS and pip/python packages?
- With this overall framework you can simulate different hardware parameters on the latency/throughput outputs. But, generally transformers are encaspulated within a docker image providing an API before deployment on cloud services. How could this benchmark could simulate real latency/throughput of the docker image deployed?
Matthieu
- So you ran these command lines directly from the c5.metal instance? You didn’t need to install before on it a docker image with OS and pip/python packages?
Yes, exactly.
- With this overall framework you can simulate different hardware parameters on the latency/throughput outputs. But, generally transformers are encaspulated within a docker image providing an API before deployment on cloud services. How could this benchmark could simulate real latency/throughput of the docker image deployed?
That’s an interesting point. We do not provide a testbed for integrated solution (yet?). Still, all the knobs discussed in this first part and the ones comming in the second part are leverageable within a container and should see the same performances benefits highlighted in the blog posts.
Of course, it doesn’t simulate the latency overhead of a web server handling incoming requests and/or dynamic batching as would Nvidia Triton do for instance.
Hope it helps,
Morgan