Containerizing transformers with Docker and FastAPI

Hi everyone!

I’ve been working on putting GPU accelerated transformer inference into production using Docker. I thought it would be helpful for me to share how I did it (link to gist):

<script src=""></script>

You’ll need to have installed nvidia-docker

I used FastAPI to set up a basic rest interface. The Dockerfile would be a good place to add an environmental variable for something like model name too if you want to set that dynamically.

My plan is to try write a container that can swap out models at request which I’ll share if I can get working.

Hope this helps!

1 Like

Hey @lmwilkin, great! Thank you for sharing this.