Gradually increasing CPU load on using sentence embeddings model with kmeans

I am having a ML based production application, using flask, deployed on GCP server using gunicorn workers. In each incoming request, a text sentence is received.

It is using sentence transformers (All-MiniLM-L6-v2 model), which is loaded globally one time, to create embeddings of the incoming text and then use pre trained kmeans (also loaded globally) to predict/map it to a intent cluster. Basically, goal is to find intent of the sentence.

I have ample resources and the requests are also constant in number and texts are also similar, but still each day the CPU load is gradually increasing. Avg response time on 1st day was around 200 ms average, after 10 days now it is 400 ms.

I have tried deleting the embedding variable using ‘del’ command in the code itself, also forcing python garbage collector using ‘gc.collect()’ in a thread which executes after the main process execution is completed, but still the issue is coming.

One thing I have noticed is that if I dont use del and gc.collect(), the RAM starts to go down gradually. With both these, RAM is constant but now CPU usage is gradually going up day by day, hence the load and response time.

I have spent weeks on this issue trying to debug it but have got no solution, any help would be appreciated.