Custom evaluation during Llama2 fine tuning

ddalakoti · August 24, 2023, 11:55am

Hi,

I am fine-tuning Llama2 for my particular use case. For this purpose, during evaluation, I want to compute model’s performance on my downstream task. For this I need to simulate model.generate as I would use during inference. From browsing the documentation, I can see that one approach is to either create a callback. So this would be something like (pseudo code):

class MyCallBack(TrainerCallBack):
   def on_evaluate(self, args, state, model, tokenizer):
         tokens = tokenizer("text")
         generated_text  = model.generate(tokens["input_ids"], tokens["attention_mask"]

I have a few questions regarding this approach:

Do I need to do any special treatment to the model before calling generate, i.e. do I need to call model.eval() so that gradients are not computed unnecessarily?
If I have loaded the model in quantized mode, do I need to take care of this when using model.generate?
Can I override model’s default generation config here without affecting training?

Another solution which could be helpful is to use the trainer.predict which is provided in the trainer API. However, I am not sure if this is same as calling model.generate i.e does it generate next token based on model’s generated tokens or based on the correct token from input?

span11 · January 17, 2024, 11:41pm

Hi, I am also trying to customize evolution for the downstream task. I wonder did you figure out what would be the best approach to do so?

Topic		Replies	Views
Repetitive Token Generation During Evaluation in Fine-Tuned LLaMA Model 🤗Transformers	1	29	March 6, 2025
Fine-tune Llama2 evaluation Beginners	0	563	November 27, 2023
Is it possible to evaluate generations/output while fine-tuning a LLM? 🤗Transformers	2	749	November 1, 2023
What does model.generate do I'm not? Beginners	2	2459	July 29, 2024
Trainer.evaluate() with text generation Beginners	5	3527	December 31, 2021

Custom evaluation during Llama2 fine tuning

Related topics