Partially loss calculation with transformers LLM Trainer and DataCollator

sleepywalker · December 26, 2024, 10:06am

I have a tokenized dataset where each sample is chunk of 8192 tokens. This sample may contain multiple original dataset’s samples if they are small or chunk of one original dataset’s sample if it is big. Some original dataset’s samples are in format “just a text without anything<END_OF_TEXT>`”, but some of this dataset’s samples are in format “beginning of the text<FIX_TOKEN>end of text”. Surely, samples in both format can be in sample of 8192 tokens. I wanna do one thing: if sample is in first format, I just want to calculate normalized cross entropy loss until EOS_TOKEN, but if sample is in second format, I want to calculate loss only from <FIX_TOKEN> to <END_OF_TEXT> token.

For example, if decoded tokenized sample of 8192 tokens is "just a text without anything<END_OF_TEXT>beginning of the text<FIX_TOKEN>end of text<END OF TEXT>", I want to calculate loss from the beginning to the first <END_OF_TEXT> token and from <FIX_TOKEN> to the second <END OF TEXT> token, where attention anyway should consider tokens of “beginning of the text”, and then calculate mean loss of this sample of 8192 tokens and to do that with inheritance from transformers Trainer class. Is it possible?

JuyiLin · March 19, 2025, 4:00pm

just set label to -100? it will ignore.

Topic		Replies	Views
The result of fine tuning will be loss =0,eval_loss=Nan. How can I start learning the right way? Beginners	0	1042	September 10, 2023
Token probabilities don't agree with the output loss Beginners	1	1303	November 15, 2022
How can I know what loss function I am using? Beginners	1	83	May 25, 2025
BertForMaskedLM’s loss and scores, how the loss is computed? 🤗Transformers	13	24983	September 22, 2023
Correct way to calculate loss Beginners	0	1118	March 10, 2021

Partially loss calculation with transformers LLM Trainer and DataCollator

Related topics