I know what is AutoModelForCausalLM
. The thing I’m asking is that in the peft
LoRA Fine tuning tutorial, the autors have used AutoModelForCausalLMWithValueHead
while you pick any code or notebook on Fine-tuning of any LLM with PEFT
style, you’ll find AutoModelForCausalLM
being used.
I went to lean on the official documentation of AutoModelForCausalLMWithValueHead
and found:
An autoregressive model with a value head in addition to the language model head
What I want to ask is that How, where and more importantly, WHY this extra ValueHead
is used
1 Like
While looking for an answer to this question, I came across this discussion. Is it helpful? what is AutoModelForCausalLMWithValueHead? · Issue #180 · huggingface/trl · GitHub
First things first, this additional ValueHead has nothing to do with PEFT.
Mainly, PPO optimization (a RLHF technique) relies on computing “advantages” associated with taking a particular action (in this case, selecting a token) in a particular state. The computation relies on value of (state,action) pair minus, the value of being in the state. You can review the exact calculation in this function: trl/trl/trainer/ppo_trainer.py at main · huggingface/trl · GitHub
The additional ValueHead simply projects the last hidden states onto a scalar to estimate the value of a state. Check the ValueHead class implementation here: trl/trl/models/modeling_value_head.py at main · huggingface/trl · GitHub
Note: The ValueHead class is only needed if you plan to perform training/re-training.
5 Likes