Difference between CausalLMWithValueHead vs ModelForCausalLM

deshwalmahesh · October 14, 2023, 11:35am

I know what is AutoModelForCausalLM. The thing I’m asking is that in the peft LoRA Fine tuning tutorial, the autors have used AutoModelForCausalLMWithValueHead while you pick any code or notebook on Fine-tuning of any LLM with PEFT style, you’ll find AutoModelForCausalLM being used.

I went to lean on the official documentation of AutoModelForCausalLMWithValueHead and found:

An autoregressive model with a value head in addition to the language model head

What I want to ask is that How, where and more importantly, WHY this extra ValueHead is used

fasterinnerlooper · January 26, 2024, 9:27pm

While looking for an answer to this question, I came across this discussion. Is it helpful? what is AutoModelForCausalLMWithValueHead? · Issue #180 · huggingface/trl · GitHub

shahzebnaveed · February 15, 2024, 9:23am

First things first, this additional ValueHead has nothing to do with PEFT.

Mainly, PPO optimization (a RLHF technique) relies on computing “advantages” associated with taking a particular action (in this case, selecting a token) in a particular state. The computation relies on value of (state,action) pair minus, the value of being in the state. You can review the exact calculation in this function: trl/trl/trainer/ppo_trainer.py at main · huggingface/trl · GitHub

The additional ValueHead simply projects the last hidden states onto a scalar to estimate the value of a state. Check the ValueHead class implementation here: trl/trl/models/modeling_value_head.py at main · huggingface/trl · GitHub

Note: The ValueHead class is only needed if you plan to perform training/re-training.

Topic		Replies	Views
Difference between CausalLM and LMHeadModel Models	1	4039	April 25, 2022
Fine-tuning with Different Model Heads Intermediate	4	766	April 30, 2024
Difference between AutoModelForCausalLM and peft_model.merge_and_unload() for a LoRA model during inference 🤗Transformers	2	1315	August 2, 2024
Perplexity from fine-tuned GPT2LMHeadModel with and without lm_head as a parameter Intermediate	4	2038	May 10, 2022
Difference between AutoModel and AutoModelForLM Beginners	2	4868	May 4, 2021

Difference between CausalLMWithValueHead vs ModelForCausalLM

Related topics