I am trying to use BioGPT as a feature encoder and I want to compare if fine-tuning is going to improve the quality of the embeddings.
So I have two options the first is to fine-tune BioGPT without passing the labels and then use the last token of the last hidden state for classification using a separate machine-learning model. (Is it possible to fine-tune BioGPT as an encoder with the labels? Do the labels make any difference since the model is not attempting to classify?)
The second option would be to use BioGptForSequenceClassification which has a sequence classification head on top (linear layer) and fine-tune this by passing the labels to the model, I can then use this fine-tuned model for the classification or use the last token of the last hidden state for classification using a separate machine learning classifier.