What is LM head mean?

lIlBrother · August 17, 2022, 8:16am

In many case of transformers`s fine tuning task. linear layer variable name used ‘lm_head’

what is that mean?
linear model head?
language model head?

in case of Wav2VecForCTC, used lm_head. but that sound weird to me.
Wav2Vec is not NLP models…!
name is wrong?

smallx-ai · May 9, 2023, 8:27am

it’s language modeling head.

abhi11nav · June 4, 2023, 2:55am

LM head is the language modelling head. The output of the transformer is a vector of size (batch_size, max_target_len, model_dimension). In the final step where you convert these transformer outputs to words, you first project them linearly and them apply softmax over it returning the probability of that position (i) in the target sequence being a certain word in the vocabulary. The layer where all of this happens is the LM head.

VGan · September 26, 2023, 7:49am

A little bit confused about how pretraining works
Is the LM head also used during pretraining? Like if pretraining is just trying to predict the next token, then the Conditional LM head would allow for this right?

berndf · September 26, 2023, 8:58am

The head is not used during pre-training in my understanding, but only afterward during fine-tuning. Here is what ChatGPT says given the question what is the “head” of a Large Language Model? (I checked this and I think it is a good explanation):

In the context of Large Language Models (LLMs) like GPT-3 or BERT, the term “head” refers to the additional layers or mechanisms added on top of the pre-trained base model to adapt it for specific tasks. These could range from classification layers for tasks like sentiment analysis to more complex architectures for tasks like machine translation or question answering.

Common Types of Heads:

Classification Head: For tasks like text classification, a fully connected (dense) layer is usually added to the output of the base model, followed by a softmax activation to produce class probabilities.
Regression Head: For regression tasks, a dense layer may be added without a softmax activation, designed to output a continuous value.
Token Classification Head: For named entity recognition or part-of-speech tagging, a token-level classifier is usually added to assign labels to each token in the input sequence.
Sequence-to-Sequence Head: For tasks like translation or summarization, a decoder mechanism may be added to generate a sequence of tokens as output.
Question-Answering Head: For QA tasks, the model might have two dense layers to predict the start and end positions of the answer span within the context text.

The specific architecture of the “head” would depend on the task it’s designed for. The idea is to fine-tune these additional layers on task-specific data to adapt the general language understanding capabilities of the LLM to the specific requirements of the task at hand.

system · August 6, 2024, 2:58am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Why is the lm_head layer in GPT2LMHeadModel not a parameter? Beginners	5	7931	September 29, 2023
Separate LM fine tuning and classification head training Beginners	5	1855	July 1, 2021
Fine-tuning with Different Model Heads Intermediate	4	765	April 30, 2024
Clarification on heads, layers, training and output Beginners	0	415	June 5, 2021
Is it possible to add linear layers before lm_head in Text Generation models? Intermediate	0	264	April 1, 2023

What is LM head mean?

Common Types of Heads:

Related topics