Persistent models

How can I get my model to stay loaded in memory? I think it’s loading every time I run this pipeline:

from transformers import pipeline
import time
start = time.time()
print("Time elapsed on working...")
#generator = pipeline('text-generation', model='bigscience/bloom-560m')
#generator = pipeline('text-generation', model='gpt2')
generator = pipeline('text-generation', model='EleutherAI/gpt-neo-1.3B')
#generator = pipeline('text-generation', model='EleutherAI/gpt-j-6B')
text = generator("Albert Einstein was:", max_length=100, num_return_sequences=1)
print(text)
time.sleep(0.9)
end = time.time()
print("Time consumed in working: ",end - start)

Or alternatively, can I run a model as a service?

Nevermind. I solved it.

I want to share the code that solves this. Here I save about 50% of the time on my second query.

from transformers import pipeline
import time
start = time.time()
print("Time elapsed on working...")
#generator = pipeline('text-generation', model='bigscience/bloom-560m')
#generator = pipeline('text-generation', model='gpt2')
generator = pipeline('text-generation', model='EleutherAI/gpt-neo-1.3B')
#generator = pipeline('text-generation', model='EleutherAI/gpt-j-6B')
text = generator("Albert Einstein was:", max_length=100, num_return_sequences=1)
print(text)
time.sleep(0.9)
end = time.time()
print("Time consumed in working: ",end - start)
text = generator("Albert Einstein was:", max_length=100, num_return_sequences=1)
print(text)
time.sleep(0.9)
end = time.time()
print("Time consumed in working: ",end - start)