Llama-3 70b - Probability outputs appear "quantized" using non-quantized model (but not with quantized model)

tomgracey · August 22, 2024, 9:53am

We have been running some prompts using Llama-3 70B using both the full model and the AWQ version (with vllm engine). In each prompt we present a sentence taken from an article, as well as the full article for context. We ask the LLM whether a certain topic title is appropriate to describe the sentence, i.e. a yes/no question. However, instead of collecting the answer (expected to be either “yes” or “no”) we measure and record the logprob of “yes” or “no” and use this to convert to a probability. The point is to obtain some kind of abstract measure of set “membership” (in the style of fuzzy sets). However, we seem to have hit on an interesting phenomenon, which I wonder if someone can explain. I am not 100% sure the effect is not caused by something we did ourselves - however, we have been over our code somewhat exhaustively without discovering anything.

The upshot is, with Llama-3 70B full version, we seem to get “quantized” probabilities. That is, when e.g. a 100 prompts are run (with 100 different sentences) we’ll get clusters of values. i.e. perhaps 10 or so sentences will have exactly the same probabilities for both “yes” and “no” to the last decimal place - e.g. 0.5621765008857981. You might think, maybe we accidentally were submitting exactly the same prompt - however, this has been double checked and the prompt and sentences are all different.

Weirdly this seems to be happening when using the non-quantized model - but not with the AWQ version, which appears to produce continuous probabilities. This seems odd. What are we missing?

Thanks in advance for any help…

Topic		Replies	Views
AWQ quantized version of Llama 3 8B ChatQA Models	0	205	May 3, 2024
How to get probabilities per label in finetuning classification task? Beginners	5	5506	February 18, 2022
T5ForConditionalGeneration, How to get prediction probabilities or logits at the inference time? (to calculate perplexity) 🤗Transformers	0	698	April 5, 2022
Why do probabilities output for a model does not correspond to label predicted by the finetune model? Beginners	3	1388	December 3, 2021
Help regarding understanding Llama output without sampling Models	0	1095	April 17, 2023

Llama-3 70b - Probability outputs appear "quantized" using non-quantized model (but not with quantized model)

Related topics