Iāve been trying to finetune the BART large pre-trained on MNLI with the Financial Phrasebank dataset to build a model for news sentiment analysis. Iām just a beginner and so, I mostly use the code from GEM Getting Started.
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, Seq2SeqTrainer, Seq2SeqTrainingArguments
tokenizer = AutoTokenizer.from_pretrained(āfacebook/bart-large-mnliā)
model = AutoModelForSeq2SeqLM.from_pretrained(āfacebook/bart-large-mnliā)
The model only trains when I use the AutoModelForSeq2SeqLM, Seq2SeqTrainer andSeq2SeqTrainingArguments. When I use model = AutoModelForSequenceClassification.from_pretrained(āfacebook/bart-large-mnliā) with the Trainer and TrainingArguments, the model does not train.
Is it appropriate to use seq2seq for sentiment classification tasks?
Are you framing your classification problem as a sequence generation task? what types of labels do you have for your training data? Are the labels text/sequence or a finite number of categories? If your task is classification I believe youāre using the wrong model class. You could probably use BertForSequenceClassification for a sentiment analysis task as has been done in the link below:
And instead of using Seq2SeqTrainer, just use Trainer and TrainingArguments.
Are you framing your classification problem as a sequence generation task?
I dont know what this means. What Iāve been trying is: given a news headline, predict the sentiment of it as being positive or negative, just like how the financial phrasebank dataset looks like.
what types of labels do you have for your training data?
Iām trying to use the bart-large-mnli model and finetune it on the financial phrasebank dataset. The financial phrasebank data has 3 labels: positive, neutral and negative.
Are the labels text/sequence or a finite number of categories?
Yes. Positive, negative and neutral.
If your task is classification I believe youāre using the wrong model class. You could probably use BertForSequenceClassification for a sentiment analysis task as has been done in the link below:
Thanks for the link. I dont know why but if I use TrainingArguments and Trainer, I either get an error as āCUDA out of memoryā or āExpect input batch size to meet targeted batch sizeā.
CUDA out of memory happens when your model is using more memory than the GPU can offer. You could try reducing the batch size or turning on the gradient_checkpointing. Or alternatively use a GPU with higher memory.