Model trains with Seq2SeqTrainer but gets stuck using Trainer

Hi,

Iā€™ve been trying to finetune the BART large pre-trained on MNLI with the Financial Phrasebank dataset to build a model for news sentiment analysis. Iā€™m just a beginner and so, I mostly use the code from GEM Getting Started.

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, Seq2SeqTrainer, Seq2SeqTrainingArguments

tokenizer = AutoTokenizer.from_pretrained(ā€˜facebook/bart-large-mnliā€™)
model = AutoModelForSeq2SeqLM.from_pretrained(ā€˜facebook/bart-large-mnliā€™)

The model only trains when I use the AutoModelForSeq2SeqLM, Seq2SeqTrainer andSeq2SeqTrainingArguments. When I use model = AutoModelForSequenceClassification.from_pretrained(ā€˜facebook/bart-large-mnliā€™) with the Trainer and TrainingArguments, the model does not train.

Is it appropriate to use seq2seq for sentiment classification tasks?

Any suggestions would be immensely helpful.

Thanks in advance.

Are you framing your classification problem as a sequence generation task? what types of labels do you have for your training data? Are the labels text/sequence or a finite number of categories? If your task is classification I believe youā€™re using the wrong model class. You could probably use BertForSequenceClassification for a sentiment analysis task as has been done in the link below:

And instead of using Seq2SeqTrainer, just use Trainer and TrainingArguments.

1 Like

Thanks a lot for replying.

Are you framing your classification problem as a sequence generation task?

I dont know what this means. What Iā€™ve been trying is: given a news headline, predict the sentiment of it as being positive or negative, just like how the financial phrasebank dataset looks like.

what types of labels do you have for your training data?

Iā€™m trying to use the bart-large-mnli model and finetune it on the financial phrasebank dataset. The financial phrasebank data has 3 labels: positive, neutral and negative.

Are the labels text/sequence or a finite number of categories?

Yes. Positive, negative and neutral.

If your task is classification I believe youā€™re using the wrong model class. You could probably use BertForSequenceClassification for a sentiment analysis task as has been done in the link below:

Thanks for the link. I dont know why but if I use TrainingArguments and Trainer, I either get an error as ā€œCUDA out of memoryā€ or ā€œExpect input batch size to meet targeted batch sizeā€.

Thanks again for your inputs.

CUDA out of memory happens when your model is using more memory than the GPU can offer. You could try reducing the batch size or turning on the gradient_checkpointing. Or alternatively use a GPU with higher memory.

1 Like

Thank you so much for your suggestions!!

1 Like