Using LLM cache

m0stafa · June 12, 2024, 4:52pm

Hi,

I’m working with an LLM model that generates text in stages. It stops after each section, performs an action, integrates the action’s result into the generated text, and then continues. While this approach works, I’m facing a problem in time it takes.

I’ve tried using a static cache, but it actually increased processing time.

Is there a way to optimize this process?

Topic		Replies	Views
Pipeline Llama3 Text Generation Saving a Memory/Cache Beginners	9	2303	January 5, 2025
Why i can't use or can't pass past_key_values = DynamicCache() into Llama 3 model Intermediate	1	287	October 8, 2024
Provide examples to model before inferencing and how to cache the examples Beginners	0	20	March 5, 2025
Model.generate use_cache=True generates different results than use_cache=False Intermediate	3	255	March 4, 2025
How to use Cache with message API 🤗Transformers	0	16	October 13, 2024

Using LLM cache

Related topics