Llama-3 70b - Probability outputs appear "quantized" using non-quantized model (but not with quantized model)

We have been running some prompts using Llama-3 70B using both the full model and the AWQ version (with vllm engine). In each prompt we present a sentence taken from an article, as well as the full article for context. We ask the LLM whether a certain topic title is appropriate to describe the sentence, i.e. a yes/no question. However, instead of collecting the answer (expected to be either “yes” or “no”) we measure and record the logprob of “yes” or “no” and use this to convert to a probability. The point is to obtain some kind of abstract measure of set “membership” (in the style of fuzzy sets). However, we seem to have hit on an interesting phenomenon, which I wonder if someone can explain. I am not 100% sure the effect is not caused by something we did ourselves - however, we have been over our code somewhat exhaustively without discovering anything.

The upshot is, with Llama-3 70B full version, we seem to get “quantized” probabilities. That is, when e.g. a 100 prompts are run (with 100 different sentences) we’ll get clusters of values. i.e. perhaps 10 or so sentences will have exactly the same probabilities for both “yes” and “no” to the last decimal place - e.g. 0.5621765008857981. You might think, maybe we accidentally were submitting exactly the same prompt - however, this has been double checked and the prompt and sentences are all different.

Weirdly this seems to be happening when using the non-quantized model - but not with the AWQ version, which appears to produce continuous probabilities. This seems odd. What are we missing?

Thanks in advance for any help…