Caching when using HuggingFace Endpoint

Hi,

I am currently building my own RAG application that I have deployed on Hugging Face Space. After using It for a while I have received the following error :

Request failed during generation: Server Error: Out of cache blocks: asked 3117, only 2916 free blocks 

That is when I realized that I had no ideas on how the caching was being handled.
I am using the following code to inference the model :

llm = HuggingFaceEndpoint(
            repo_id=llm_model, 
            temperature = temperature,
            max_new_tokens = max_tokens,
            streaming=True,
            task="text2text-generation",
            top_k = top_k,
            #top_p=0.95,
            repetition_penalty=1.0,
        )

Can you enlighten me on how caching is done with HuggingFaceEndpoint & HuggingFace Space ?

(Is huggingFace using the data from our cache) ?