About the origin of the model category names in `AutoModelWithLMHead`

yusukemori · December 21, 2020, 4:25pm

Hello,

I’d like to ask about where the model category names come from.

In AutoModelWithLMHead class, the warning says we should use AutoModelForCausalLM, AutoModelForMaskedLM, or AutoModelForSeq2SeqLM instead of it.

class AutoModelWithLMHead:
    r"""
    This is a generic model class that will be instantiated as one of the model classes of the library---with a
    language modeling head---when created with the when created with the
    :meth:`~transformers.AutoModelWithLMHead.from_pretrained` class method or the
    :meth:`~transformers.AutoModelWithLMHead.from_config` class method.

    This class cannot be instantiated directly using ``__init__()`` (throws an error).

    .. warning::

        This class is deprecated and will be removed in a future version. Please use
        :class:`~transformers.AutoModelForCausalLM` for causal language models,
        :class:`~transformers.AutoModelForMaskedLM` for masked language models and
        :class:`~transformers.AutoModelForSeq2SeqLM` for encoder-decoder models.
    """

I’m afraid this may not be a very essential question, but is there any origin for the names of the classification of CausalLM, MaskedLM, and Seq2SeqLM?
Or are they original to the transformers library?
I would like to know more about the source of the terms in using the library.

Thank you in advance.

sgugger · December 21, 2020, 6:08pm

The differences between the three are explained in the model summary. Causal language modeling/masked language modeling are very often used in research papers, so those terms don’t come from the transformers library.

yusukemori · December 21, 2020, 6:18pm

Thank you for telling me the link to the document. I’ve seen the page, but it seems my understanding was not enough. I will do a closer look.
After your explanation, I understand that causal language modeling/masked language modeling are common terms.
Thank you again.

Topic		Replies	Views
Difference between CausalLM and LMHeadModel Models	1	4015	April 25, 2022
Difference BertModel, AutoModel and AutoModelForMaskedLM 🤗Transformers	8	4945	March 9, 2025
How to use CausalLM model to pre-train, and use SequenceClassification model to fine-tune? Beginners	2	708	August 31, 2023
Fine-tuning with Different Model Heads Intermediate	4	749	April 30, 2024
Questions on the `BertModelLMHeadModel` 🤗Transformers	7	6187	October 5, 2020

About the origin of the model category names in `AutoModelWithLMHead`

Related topics