Hi everyone!
I’ve been working on putting GPU accelerated transformer inference into production using Docker. I thought it would be helpful for me to share how I did it (link to gist):
<script src="https://gist.github.com/lmwilkin/359ef8ada2eb1766d049719e9fc7053a.js"></script>
You’ll need to have installed nvidia-docker
I used FastAPI to set up a basic rest interface. The Dockerfile would be a good place to add an environmental variable for something like model name too if you want to set that dynamically.
My plan is to try write a container that can swap out models at request which I’ll share if I can get working.
Hope this helps!