Memory keeps growing when called from Uvicorn/FastAPI

mjazwinski · August 26, 2022, 8:34am

Hi,

I am experimenting with huggingface transformers using sentence to sentence model (Helsinki_en_to_de, Helsinki_de_to_en) and no matter if I call the tokenizer and then model manually or in the pipeline if I do a few concurrent inferences the memory used by Uvicorn grows. The more concurrent requests the more dramatic the growth is. Memory is not released. Uvicorn starts well below 1 GB, after 1 model is loaded it goes up to 1.1 GB and then when I run inferences concurrently it can get to 6 GB (I stopped at this point).

I tried using smaller batches (max_lenght=128) - the memory still usage kept increasing, just slower

I tried to use tracemalloc and pympler to profile memory but they do not pinpoint any reason (that is no single collection of objects is growing that much). When using pympler a warning was printed:

Relevant parts of my code:

tokenized = tokenizer(sentences, return_tensors="pt", padding=True)
translated_encoded = model.generate(**tokenized)
translated = [tokenizer.decode(t, skip_special_tokens=True) for t in translated_encoded]

I would be grateful for any suggestions.

Topic		Replies	Views
Memory increasing after hugging face generate method Models	0	39	November 24, 2024
Memory usage of gunicorn workers? Beginners	1	3021	July 21, 2022
Why is the tensor produced by inference so big? Beginners	2	431	April 17, 2023
Continuous Memory Usage increasing 🤗Transformers	0	79	November 26, 2024
Missmatch between memory-estimate and Trainer-API Beginners	0	182	January 23, 2024

Memory keeps growing when called from Uvicorn/FastAPI

Related topics