Same model GPT-NEO-XT behave differently with same prompts & different context

Hello Experts,

I am trying to use GPT-BEO-XT-20B model for question-answering use case. I am using the prompt as
prompts = [
“”“:{context}{question}:”“”,
]
or
prompts = [
“”“:As a customer service agent, your primary goal is to assist users to the best of your ability. The response requires detail, providing examples creates more detail in your responses. Answer truthfully. Provide detailed answer to the question based on the context.If couldn’t find answer say I don’t know. {context} {question} :”“”,
]

The model is working decent for few given contexts.

But for a few contexts, if I remove : & : , the model works fine. If I keep these keywords in my prompt, the model does not work at all. It’s always giving wrong answers.

Based on the context (which is dynamic in nature), we can not define the prompt with or without : tags.

What is the reason for such behavior & how to address a production line inference pipeline?

Not sure if it matters, I observed the same behavior with Int_8 quantized model & without (Fp16) as well.

Happy to share more details, if required.

Any suggstions?