i am trying to make a database query generator for my schema and to get accurate results i pass some user assistant interactions so that it follow the format of queries.
schema = # database schema
messages =[ {"role": "user", "content": "Find John"},
{"role": "assistant", "content": "{\"response.someset.subset.match_name\": \"John\"}"} # and like 10 more examples like this
]
current_directory = os.path.dirname("/workspace/Qwen2.5-Coder-1.5B-Instruct/")
model = AutoModelForCausalLM.from_pretrained(
current_directory,#"microsoft/Phi-3.5-mini-4k-instruct",
device_map="cuda",
torch_dtype="auto",
trust_remote_code=True,
)
#tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3.5-mini-4k-instruct")
tokenizer = AutoTokenizer.from_pretrained(current_directory)
print("starting")
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
)
messages.append()#the query i want to run
output = pipe(messages, **generation_args)
Now the difficulty is that passing a long schema and so many messages increase my time a ton, is there a way to pass it once and for it to remember without passing again and again ?