What us the difference between CausalLM and LMHeadModel? Both returns the similar variables. Loss, logits etc…
Example: GPT2LMHeadModel.from_pretrained(‘gpt2’) and AutoModelForCausalLM.from_pretrained(‘gpt2’) has the same model structure.
What us the difference between CausalLM and LMHeadModel? Both returns the similar variables. Loss, logits etc…
Example: GPT2LMHeadModel.from_pretrained(‘gpt2’) and AutoModelForCausalLM.from_pretrained(‘gpt2’) has the same model structure.
The name LMHeadModel
are old names we used before for some models, but we stopped as it’s not very informative on what kind of language model head we’re talking about. To avoid breaking changes, we won’t rename the old classes, but the auto API and all newer models should have ForCausalLM
or ForMaskedLM
or ForSeq2SeqLM
depending on that kind of LM objective the model has.