Clarification on heads, layers, training and output

vinurad13 · June 5, 2021, 4:10pm

Hi,

I am using Tensorflow and I am doing a multi-class sentence classification using XLM-R base with a custom data set. (“jplu/tf-xlm-roberta-base”)

It seems that for different tasks HF has different heads (such as, tokenclassifer, sequenceclassifier etc.) and differ by the head at the end of the base architecture. So, if I’m not using a fine-tuned checkpoint, I have to fine-tune that head as well. My first question is, can we change that head so that we can replace a different head (say a custom feed forward neural network or LSTM based network) for the same task while keeping the pre-trained base model? Can we know the details of the head that comes by default for a particular task (i.e:- the random initialized head for sequenceclassification)?
Can we access the outputs from different layers of the model? Is there an API/method to do that in HuggingFace? Especially the outputs before the final head? and can we do pruning of different layers of the model? (say, on XLM-R Base)
When we fine tune, is it a global fine-tuning (update the base model parameters + update head’s parameters) or just feature extraction (freeze original parameters of the base model checkpoint and only train the head). Can we perform the other one which is not the default fine tuning method?
Final question, when using the Trainer.predict() output I get a np array. I guess the values in it are logits? Say for a multi-class classification of sentences (with 4 classes), these logits output means that softmax is not happening at the final head? How can we interpret this output? Below is my output.

PredictionOutput(predictions=array([[ 1.2991945e+00,  2.4860173e-01,  5.5320925e-01, -1.6669977e+00],
       [ 4.3599471e-01, -5.0883066e-02,  3.4532386e-01, -5.2039641e-01],
       [ 8.9458901e-01,  1.2760645e+00,  3.1270528e-01, -1.6415002e+00],
       [ 9.0530002e-01,  1.0148852e+00,  2.6518843e-01, -1.4662132e+00],
       [ 4.5786294e-01, -5.0590429e-02,  2.0140493e-01, -4.1767478e-01],
       [ 5.7495612e-01,  4.7848277e-02,  1.5834071e-01, -6.1066955e-01],
       [ 4.6566209e-01, -2.0567745e-02,  2.1055032e-01, -4.6179143e-01],
       [ 5.0190979e-01, -4.8803892e-02,  1.9314916e-01, -4.8067909e-01],
       [ 5.7928652e-01,  6.7762680e-02,  2.0994107e-01, -6.0617983e-01],
       [ 5.3082645e-01, -2.2240670e-02,  3.7937027e-01, -6.5518349e-01],
       [ 5.8990896e-01, -5.2324682e-02,  3.2848221e-01, -6.4274567e-01],
       [ 5.4325098e-01, -2.4219263e-02,  2.2602598e-01, -6.0041779e-01],
       [ 4.9988240e-01, -1.4048552e-03,  3.3386120e-01, -5.7529330e-01],
       [ 4.2276594e-01, -5.3270590e-02,  1.9273782e-01, -3.8076913e-01],
       [ 4.8189813e-01, -5.7544138e-02,  2.1740533e-01, -4.4707236e-01],
       [ 5.3467524e-01, -8.4268771e-02,  3.8555554e-01, -6.2313312e-01],
       [ 3.8383359e-01, -8.2594566e-02,  1.8413506e-01, -3.2918590e-01],
       [ 1.0014045e+00,  2.6587926e-02,  1.0125093e+00, -1.6064054e+00],
       [ 5.6751728e-01, -2.3115154e-02,  2.0180833e-01, -5.6251198e-01],
       [ 5.4358459e-01, -4.8401270e-02,  2.9657021e-01, -5.8822620e-01]],
      dtype=float32), label_ids=array([3, 2, 1, 1, 2, 1, 2, 0, 2, 0, 0, 0, 2, 2, 1, 0, 1, 2, 0, 0]), metrics={'eval_loss': 1.224756399790446, 'eval_accuracy': 0.5, 'eval_precision': 0.7441176470588236, 'eval_recall': 0.5})

Sorry, if this is too long. I thought asking these altogether as these are somewhat related.
Thanks

Topic		Replies	Views
Getting outputs of mode.predict() per sentence input Models	3	2435	June 21, 2021
XLM-R classifier predictions produce errors Beginners	2	662	June 25, 2021
Separate LM fine tuning and classification head training Beginners	5	1860	July 1, 2021
Does changing the classification head of models always require fine-tuning? Beginners	1	622	May 5, 2023
Fine tune Transformers for text generation 🤗Transformers	11	12009	July 27, 2023

Clarification on heads, layers, training and output

Related topics