Difference between CausalLM and LMHeadModel

What us the difference between CausalLM and LMHeadModel? Both returns the similar variables. Loss, logits etc…

Example: GPT2LMHeadModel.from_pretrained(‘gpt2’) and AutoModelForCausalLM.from_pretrained(‘gpt2’) has the same model structure.

The name LMHeadModel are old names we used before for some models, but we stopped as it’s not very informative on what kind of language model head we’re talking about. To avoid breaking changes, we won’t rename the old classes, but the auto API and all newer models should have ForCausalLM or ForMaskedLM or ForSeq2SeqLM depending on that kind of LM objective the model has.

1 Like