How to pass large context to pipeline once instead of again and again for each query?

i am trying to make a database query generator for my schema and to get accurate results i pass some user assistant interactions so that it follow the format of queries.

schema = # database schema
messages =[    {"role": "user", "content": "Find John"},
    {"role": "assistant", "content": "{\"response.someset.subset.match_name\": \"John\"}"} # and like 10 more examples like this
]
current_directory = os.path.dirname("/workspace/Qwen2.5-Coder-1.5B-Instruct/")
model = AutoModelForCausalLM.from_pretrained( 
        current_directory,#"microsoft/Phi-3.5-mini-4k-instruct",  
        device_map="cuda",  
        torch_dtype="auto",  
        trust_remote_code=True,  
    ) 
#tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3.5-mini-4k-instruct") 
tokenizer = AutoTokenizer.from_pretrained(current_directory) 
print("starting")
pipe = pipeline( 
        "text-generation", 
        model=model, 
        tokenizer=tokenizer, 
    ) 
messages.append()#the query i want to run
output = pipe(messages, **generation_args)

Now the difficulty is that passing a long schema and so many messages increase my time a ton, is there a way to pass it once and for it to remember without passing again and again ?

1 Like