TRL - Fine tuned small model (facebook350m) yields many empty inferences

John6666 · June 19, 2025, 7:57am

The options to be given to the trainer may be quite different from other models?
https://stackoverflow.com/questions/76857722/huggingface-sft-for-completion-only-not-working

github.com/huggingface/trl

`RewardTrainer` hits NaN output with quantized pretrained model

opened 04:45AM - 21 Jan 24 UTC

closed 03:05PM - 08 Mar 24 UTC

chenmoneygithub

Hi team, I am trying to tune my reward model (`opt-350m`) via `RewardTrainer`…, while it works fine without applying quantization, using int4 + lora hit the NaN problem after <10 steps. Here is the reproducible code: [github gist](https://colab.research.google.com/gist/chenmoneygithub/a2e0895b0dba6e49b440686cda5ed01b/reproduce-reward-model-nan-issue.ipynb?authuser=1#scrollTo=FFTBZa0iTkwf), tested on A100 environment. Could anyone provide any insight? I am also wondering how people debug quantization-related issue when using HuggingFace trainer, is it possible to print out the gradients, and outputs of certain layers inside the model? Thanks a lot!

Topic		Replies	Views
mT5 Question/Answering fine tuning is generating empty sentences during inference 🤗Transformers	2	656	June 2, 2024
Inference from a fine-tuned model -- help with interpretation of results Beginners	3	366	January 26, 2024
When I try to use my fine-tuned Causal LM model to inference a prompt, I get nothing but the last word repeated multiple times 🤗Transformers	1	514	April 14, 2024
Dataset parameters to finetune a pretrained translation model on new vocabulary Models	0	363	July 5, 2023
Finetuning T5-small delivers incorrect outputs after finetuning 🤗Transformers	1	364	July 4, 2023

TRL - Fine tuned small model (facebook350m) yields many empty inferences

Related topics