Few Shot Learning makes LLM hallucinate with slightly different prompt template

EquinoxElahin · November 15, 2023, 4:21pm

Hi, I’ve just noticed something weird about LLM “few shot prompting” (addng example in your prompt) technic and space at ending.
I will exemplify it to make it cristal clears.
I want a LLM do a very basic task, here is to identify an adress (it does the same weird stuff for other tasks). I gave it, Vigogne-2-7b-chat, this prompt (in French sorry, i you translate it you will got the sense).

"""<|system|>: Vous êtes un assistant IA. Vous devez identifiez l'adresse dans le texte suivant. Si il n'y a pas d'adresse dans la phrase, retourne "Aucune adresse".
<|user|>: Exemples :
- Phrase : J'habite au 55 rue de la pompe.
- Adresse : 55 rue de la pompe.
- Phrase : Mon immeuble situé au 100 cours de Vincennes dans le 20ème est menacé par des rats.
- Adresse : 100 cours de Vincennes.
- Phrase : J'aimerais des infos concernant la norme RT60 lié à l'isolation de son logement.
- Adresse : Aucune adresse.
- Phrase : J'habite au 22 avenue du général Leclerc.
- Adresse : 22 avenue du général Leclerc.

- Phrase : J'ai un problème d'eau chaude.
<|assistant|>: Adresse :"""

And very very often it works well.(The token ‘A’, which stands for “Aucun adresse” in a greedy decoding has 99% probability) But when I add a space at the end of the prompt.

"""<|system|>: Vous êtes un assistant IA. Vous devez identifiez l'adresse dans le texte suivant. Si il n'y a pas d'adresse dans la phrase, retourne "Aucune adresse".
<|user|>: Exemples :
- Phrase : J'habite au 55 rue de la pompe.
- Adresse : 55 rue de la pompe.
- Phrase : Mon immeuble situé au 100 cours de Vincennes dans le 20ème est menacé par des rats.
- Adresse : 100 cours de Vincennes.
- Phrase : J'aimerais des infos concernant la norme RT60 lié à l'isolation de son logement.
- Adresse : Aucune adresse.
- Phrase : J'habite au 22 avenue du général Leclerc.
- Adresse : 22 avenue du général Leclerc.

- Phrase : J'ai un problème d'eau chaude.
<|assistant|>: Adresse : """

It very often hallucinates in favor of tokens which are in my example (The token ‘2’ has 35%, ‘5’ 28% and ‘A’ 20%).
First thing, the space at the end of the prompt really messed up the generation (still in a greedy way). Does someone has a clue/remark about it?

Then, I did the same (adding a space or not at the end of the prompt) but without the example and, the LLM does not
hallucinate anymore. The token ‘A’ has a very high probability in both case and of course no digits-token in sight.

Does someone already faced this kind of situation and have a thought about it?
Using a not perfect prompt format (adding a space at the end) the few shot examples makes LLM hallucinate while using a perfect prompt (no space at the end)
it works well.
We knew the importance of prompt templating, but this is a step further.

EquinoxElahin · November 16, 2023, 9:39am

github.com/facebookresearch/llama

Weird bias towards numbers after a generic prompt

opened 10:24PM - 19 Mar 23 UTC

closed 06:13PM - 22 Oct 23 UTC

vigna

model-usage

In all the models up to 30B, using the standard parameters from `example.py` (an…d many variations on them), the continuations of the prompt "The first image that comes to my mind is " all start with a number. It can be a date, an actual number, some numbered passage for the Gospel, etc., but the token after "is" is always a number. I tried also invoking `half()` on the model, play with temperature etc. but I couldn't change this behavior. That looks like a really weird bias to me. Am I wrong? I also cross-checked with the [C++ implementation](https://github.com/ggerganov/llama.cpp). In that case, the behavior stops after the number of token to predict goes beyond 200. So I guess there's something different in the initialization of the model (that I couldn't understand).

Here is an answer

Topic		Replies	Views
How to prevent LLM from generating multiple rounds of conversation? Models	3	9100	February 29, 2024
How to Implement Few-Shot Prompting in LLaMA-2 Chat Model Models	4	6120	April 9, 2025
LLM Zero shot-text classification - How do you answer multiple questions computationally efficiently? 🤗Transformers	0	1598	December 8, 2023
Rules Vs Examples Beginners	1	22	April 7, 2025
Structuring chat histories while also mitigating more than one chatbot response 🤗Datasets	0	398	December 16, 2023

Few Shot Learning makes LLM hallucinate with slightly different prompt template

Related topics