Controlling AI's determinism - Score

Context:

I’m currently working on a document-based score analysis using Gemini 2.0-Flash. The goal is to evaluate Document X by extracting certain metrics from it and comparing them with a reference document, Document Y — essentially comparing “what I have” (X) versus “what’s required” (Y).

Issue:

We’re using a consistent prompt to compare Document X against Document Y. However, we’ve noticed that the generated scores vary between runs, even when the input documents and prompt remain unchanged.

  • Most of the time, the score fluctuation is within a range of ~5%, which is acceptable.
  • But occasionally, we see a much larger variation — sometimes as high as 25%, which is problematic for our use case where reliability and consistency are critical.

Looking for Suggestions:

How can we reduce this inconsistency and make the score generation more stable and reliable?

1 Like

You may already be doing this, but I think it’s better to set the temperature to 0 first.

1 Like

Yep, tried the temperature, topP, topK and couldn’t achieve determinism, thanks for sharing the paper I noticed a new thing “seed” which I didn’t come across in my research, which seems interesting.
I’ll try it out and test it, and see if it solves the problem.
Thanks for helping out and your time.

1 Like

Welcome to posting @priyatham10101

I do not know what is the matter but it reminds me of Chaos theory and initial conditions with Floating Point numbers.

So, my simple idea is to restart everything from a fresh boot and just run it alone. Record the output and then run it again to see if it differs.
But feel free to ignore my suggestions as I am not even running my own stuff here yet.

1 Like