Blenderbot 1.0B Distilled eats up memory over many inferences

alu13 · March 7, 2022, 10:15pm

Hi, I’ve noticed that over the course of many inferences, the Blenderbot 1.0B Distilled model continuously allocates GPU memory and eventually causes the GPU to crash. My project only uses single-turn inferences, and I was wondering how to prevent Blenderbot from continuously allocating memory. Thanks!

Topic		Replies	Views
Why does all my gpu memory get used with a small model? Beginners	5	2135	March 13, 2022
Continous increase in Memory usage 🤗Transformers	12	1163	December 1, 2024
How can we maximize the GPU utilization in Inference Endpoints? Inference Endpoints on the Hub	1	2247	July 20, 2023
Question about memory usage Beginners	0	898	May 15, 2023
AutoModel Classifier distilBERT on Parallel GPUs Intermediate	0	36	November 13, 2024

Blenderbot 1.0B Distilled eats up memory over many inferences

Related topics