In many case of transformers`s fine tuning task. linear layer variable name used ‘lm_head’
what is that mean?
linear model head?
language model head?
in case of Wav2VecForCTC, used lm_head. but that sound weird to me.
Wav2Vec is not NLP models…!
name is wrong?