ZERO loss while finetuning Llama2 usin SFT trainer and the use of collator

Hessa · November 28, 2023, 9:00am

Hello everyone,

my code is:

response_template = "Answer: [/INST]"
collator = DataCollatorForCompletionOnlyLM(response_template=response_template_tokenized , tokenizer=tokenizer)

example = """<s>[INST] <<SYS>> Please select the correct answer from the given multiple Options based on the given Context: <</SYS>> Context: Geology is the study of the Earths solid material and structures and the processes that create them. Some ideas geologists might consider include how rocks and landforms are created or the composition of rocks, minerals, or various landforms. Geologists consider how natural processes create and destroy materials on Earth, and how humans can use Earth materials as resources, among other topics. Geologists study rocks in the field to learn what they can from them. Question: Earth science is the study of Options:(A) solid Earth (B) Earths oceans (C) Earths atmosphere (D) all of the above Answer: [/INST] D </s>"""

example_encoded = tokenizer(example)
collator([example_encoded])

So I’m using the collator to only compute the loss on the predicted answer of the Llama2 model as pointed by @BayesRulez (thanks to you!). but what I am getting is zero for the loss on every training step.

this output is printed while fine-tuning:


Context: Your sense of taste is controlled by sensory neurons, or nerve cells, on your tongue that sense the chemicals in food. The neurons are grouped in bundles within taste buds. Each taste bud actually has a pore that opens out to the surface of the tongue enabling molecules and ions taken into the mouth to reach the receptor cells inside. There are five different types of taste neurons on the tongue. Each type detects a different taste. The tastes are: 1. Sweet, which is produced by the presence of sugars, such as the common table sugar sucrose, and a few other substances. 2. Salty, which is produced primarily by the presence of sodium ions. Common salt is sodium chloride, NaCl. The use of salt can donate the sodium ion producing this taste. 3. Sour, which is the taste that detects acidity. The most common food group that contains naturally sour foods is fruit, such as lemon, grape, orange, and sometimes melon. Children show a greater enjoyment of sour flavors than adults, and sour candy such as Lemon Drops, Shock Tarts and sour versions of Skittles and Starburst, is popular. Many of these candies contain citric acid. 4. Bitter is an unpleasant, sharp, or disagreeable taste. Common bitter foods and beverages include coffee, unsweetened cocoa, beer (due to hops), olives, and citrus peel. 5. Umami, which is a meaty or savory taste. This taste can be found in fish, shellfish, cured meats, mushrooms, cheese, tomatoes, grains, and beans. A single taste bud contains 50100 taste cells representing all 5 taste sensations. A stimulated taste receptor cell triggers action potentials in a nearby sensory neuron, which send messages to the brain about the taste. The brain then decides what tastes you are sensing. 
Question: which taste will be associated with citrus fruits?
Options:(A) sweet (B) sour (C) salty (D) bitter
Answer: [/INST] B </s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s> This instance will be ignored in loss calculation. Note, if this happens often, consider increasing the `max_seq_length`.
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/trl/trainer/utils.py:120: UserWarning: Could not find response key `Answer: [/INST]` in the following instance: <s><s> [INST] <<SYS>>
 Please select the correct answer from the given multiple Options based on the given Context:
<</SYS>>

Is there something I’m missing and need to be fixed?

sarx11 · December 9, 2023, 6:47am

Hi, this is because all tokens in your training examples are being ignored since the response template is not found. This is due to the fact that some tokenizers tokenize text differently based on whether there are more words in the text or not. Therefore, tokenizing the response_template returns different token ids than the token ids of the response in the example string.

The following will fix your issue:

response_template = "Answer: [/INST]"
collator = DataCollatorForCompletionOnlyLM(tokenizer.encode(response_template, add_special_tokens = False)[2:], tokenizer=tokenizer)

Reference: Supervised Fine-tuning Trainer

Topic		Replies	Views
Get the predictions using DataCollator For Completion OnlyLM after fine-tuning Llama2 using SFT trainer 🤗Transformers	0	514	November 13, 2023
TRL SFT super prone to nan when using data collator Intermediate	2	1323	April 27, 2024
Kosmos-2 Fine tuning 🤗Transformers	41	1926	August 19, 2024
Ideal loss and training values? Beginners	1	187	May 20, 2025
Llama 2 fine tuning general questions (tokenizer, compute_metrics, labels)) Beginners	0	1511	October 28, 2023

ZERO loss while finetuning Llama2 usin SFT trainer and the use of collator

Related topics