Model trains with Seq2SeqTrainer but gets stuck using Trainer

ssam9 · August 18, 2021, 6:39pm

Hi,

I’ve been trying to finetune the BART large pre-trained on MNLI with the Financial Phrasebank dataset to build a model for news sentiment analysis. I’m just a beginner and so, I mostly use the code from GEM Getting Started.

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, Seq2SeqTrainer, Seq2SeqTrainingArguments

tokenizer = AutoTokenizer.from_pretrained(‘facebook/bart-large-mnli’)
model = AutoModelForSeq2SeqLM.from_pretrained(‘facebook/bart-large-mnli’)

The model only trains when I use the AutoModelForSeq2SeqLM, Seq2SeqTrainer andSeq2SeqTrainingArguments. When I use model = AutoModelForSequenceClassification.from_pretrained(‘facebook/bart-large-mnli’) with the Trainer and TrainingArguments, the model does not train.

Is it appropriate to use seq2seq for sentiment classification tasks?

Any suggestions would be immensely helpful.

Thanks in advance.

aliosk · August 19, 2021, 1:06pm

Are you framing your classification problem as a sequence generation task? what types of labels do you have for your training data? Are the labels text/sequence or a finite number of categories? If your task is classification I believe you’re using the wrong model class. You could probably use BertForSequenceClassification for a sentiment analysis task as has been done in the link below:

And instead of using Seq2SeqTrainer, just use Trainer and TrainingArguments.

ssam9 · August 20, 2021, 8:54am

Thanks a lot for replying.

Are you framing your classification problem as a sequence generation task?

I dont know what this means. What I’ve been trying is: given a news headline, predict the sentiment of it as being positive or negative, just like how the financial phrasebank dataset looks like.

what types of labels do you have for your training data?

I’m trying to use the bart-large-mnli model and finetune it on the financial phrasebank dataset. The financial phrasebank data has 3 labels: positive, neutral and negative.

Are the labels text/sequence or a finite number of categories?

Yes. Positive, negative and neutral.

If your task is classification I believe you’re using the wrong model class. You could probably use BertForSequenceClassification for a sentiment analysis task as has been done in the link below:

Thanks for the link. I dont know why but if I use TrainingArguments and Trainer, I either get an error as “CUDA out of memory” or “Expect input batch size to meet targeted batch size”.

Thanks again for your inputs.

aliosk · August 20, 2021, 9:55am

CUDA out of memory happens when your model is using more memory than the GPU can offer. You could try reducing the batch size or turning on the gradient_checkpointing. Or alternatively use a GPU with higher memory.

ssam9 · August 23, 2021, 9:40am

Thank you so much for your suggestions!!

Topic		Replies	Views
Problem fine-tuning a model with Seq2Seq Trainer Beginners	1	995	June 25, 2023
[Beginner] fine-tune Bart with custom dataset in other language? Beginners	2	3234	January 22, 2021
How to use Seq2seq Trainer with my original "[MASK]" Beginners	2	719	October 22, 2020
Don't Stop Pretraining BART Research	1	906	December 29, 2020
TypeError when loading tokenizer with from_pretrained method for bart-large-mnli model 🤗Tokenizers	1	1119	July 8, 2021

Model trains with Seq2SeqTrainer but gets stuck using Trainer

Related topics