Is it possible to evaluate generations/output while fine-tuning a LLM?

agademic · March 21, 2023, 5:44pm

Is it possible to run some prompts and generate outputs for these prompts during fine-tuning (with transformers.Trainer), e.g. on each eval step?

I’ve seen some predictions in the w&b report for this blog here but I am not sure whether these predictions were made by loading the respective checkpoints.

agademic · April 3, 2023, 1:43pm

If someone stumbles upon the same question: Here is where to look to implement custom callbacks on evaluation.

braintrustdata · November 1, 2023, 10:07pm

Late here, but Braintrust is a great tool / platform for evaluating your LLM. We have a simple library in Python/Typescript for running and logging evaluations so you can use our web UI to dig into the results.

It’s free to use @ https://braintrustdata.com/

Topic		Replies	Views
Custom evaluation during Llama2 fine tuning Beginners	1	1058	January 17, 2024
Fine-tune Llama2 evaluation Beginners	0	563	November 27, 2023
Repetitive Token Generation During Evaluation in Fine-Tuned LLaMA Model 🤗Transformers	1	36	March 6, 2025
Running generate while evaluating test set? 🤗Transformers	0	195	August 8, 2023
Eval Steps after warm-up 🤗Transformers	0	248	August 7, 2021

Is it possible to evaluate generations/output while fine-tuning a LLM?

Related topics