I am using nllb models in my application through hugging face. Currently the model is getting loaded with each api call, which in turn results in higher execution time.
I need to keep loaded model in something like a cache so that it can be reused for next api calls. Does hugging face provides any functionality of caching and locking for loaded models?
Your prompt response will be highly appreciated.