Containerizing transformers with Docker and FastAPI

lmwilkin · August 26, 2020, 5:31pm

Hi everyone!

I’ve been working on putting GPU accelerated transformer inference into production using Docker. I thought it would be helpful for me to share how I did it (link to gist):

<script src="https://gist.github.com/lmwilkin/359ef8ada2eb1766d049719e9fc7053a.js"></script>

You’ll need to have installed nvidia-docker

I used FastAPI to set up a basic rest interface. The Dockerfile would be a good place to add an environmental variable for something like model name too if you want to set that dynamically.

My plan is to try write a container that can swap out models at request which I’ll share if I can get working.

Hope this helps!

valhalla · August 28, 2020, 3:33pm

Hey @lmwilkin, great! Thank you for sharing this.

Topic		Replies	Views
Containerizing Huggingface Transformers for GPU inference with Docker and FastAPI 🤗Transformers	0	2977	October 5, 2021
Docker container, run model only 🤗Transformers	0	1140	October 21, 2020
Deploying inference model size and performance 🤗Transformers	6	5207	July 9, 2024
Transformer model works locally but not in Docker container Models	6	2309	June 10, 2025
Deploy multilingual sentence tansformer into cloud Beginners	10	2695	July 16, 2021

Containerizing transformers with Docker and FastAPI

Related topics